[freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Daniel Cheng
On Thu, Jun 11, 2009 at 10:15 PM, Mike Bush wrote:
> On Thu, 2009-06-11 at 21:25 +0800, Daniel Cheng wrote:
>> On 11/6/2009 20:16, Mike Bush wrote:
>> > 2009/6/10 Daniel Cheng:
>> [...]
>> >>
>> >> This is yet another reason to split the ?part out.
>> >
>> > I've built 2 indexes to find the space saving from separating keys
>> > from words as well,
>> > ? for an index> ?16000 keys with 256 subindices :
>> >
>> > The normal index with keys integrated in files>400MB
>> > With keys in a separate key index(3MB) it totals 160MB
>> >
>> > Of course the difference wouldn't be so large if the index wasn't
>> > separated into so many pieces.
>> >
>> > One thing I worried about was that the file index would get very
>> > large, but even for the key index to be bigger than one of wanna's
>> > subindexes it would contain> ?32 keys. How many keys do very large
>> > indexes have?
>>
>> For a starter idea,
>> try to split the  into multiple files..
>>
>> ? ? ?site_.xml
>> where
>> ? ? ? is the prefix of MD5( SSK@/CHK@ of the site )
>>
>> take the MD5 of the key, but _NOT THE DOC PATH_.
>> This would have the following advantage:
>>
>> ? ? - the file would compress better
>>
>> ? ? - USK@ edition would be grouped together
>> ? ? ? ? * USK Edition based magics are easier.
>> ? ? ? ? * Words across multiple edition would look simliar,
>> ? ? ? ? ? grouping means lessor site file to fetch
>>
>
> I would imagine that splitting the site index would be futile though, if
> it was only split into a few, for example 16 files, a typical search
> result of many hundreds of results would still require most parts. On
> the other hand, a large number of splits would mean a smaller proportion
> could be requested but the large number of requests would slow it
> further.


Possible Sol'n:
(NOTE TO mikeb: You don't have to implement this in this version;
 this can be another summer :))

   * Load them lazily.
   - Splitting the result into pages.
   - Include all the stats related to ranking in the keyword index file,
  that would be the term position.  so we can do TF-IDF kind of ranking
   - Prefetch the site files for other pages in the background.



[freenet-dev] Localising plugins

2009-06-11 Thread Mike Bush
On Thu, 2009-06-11 at 13:35 +0100, Matthew Toseland wrote:
> On Thursday 11 June 2009 09:53:51 VolodyA! V Anarhist wrote:
> > > Nextgens has pointed out that this is going to make it harder to build a 
> > > proper plugin API with no shared code and only interfaces. Any 
> > > suggestions for how to make this easy for plugins without them having to 
> > > inherit classes from Freenet itself?
> > 
> > Why is it so bad to inherit something? Look at the design of something like
> > Google Android, not only do you have to inherit whole bunch of stuff, but 
> > you
> > are locked in the design pattern by it.
> 
> Because it makes isolating plugins from the node code via classloader hacks 
> more difficult? Eventually we want to be able to support "untrusted" 
> plugins...

I would like to localise the search interface, should I wait until a
plan is made for how to do this or just implement a quick method such as
array of keys and arrays for languages. There aren't many strings so i
suppose it wouldnt be a problem.




[freenet-dev] Win Installer it_IT l10n 090611

2009-06-11 Thread Luke771
Attached.
Please commit.
-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: Include_Lang_it.inc_090611_luke771
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090611/24c576c6/attachment.ksh>


[freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Daniel Cheng
On 11/6/2009 20:16, Mike Bush wrote:
> 2009/6/10 Daniel Cheng:
[...]
>>
>> This is yet another reason to split the  part out.
>
> I've built 2 indexes to find the space saving from separating keys
> from words as well,
>   for an index>  16000 keys with 256 subindices :
>
> The normal index with keys integrated in files>400MB
> With keys in a separate key index(3MB) it totals 160MB
>
> Of course the difference wouldn't be so large if the index wasn't
> separated into so many pieces.
>
> One thing I worried about was that the file index would get very
> large, but even for the key index to be bigger than one of wanna's
> subindexes it would contain>  32 keys. How many keys do very large
> indexes have?

For a starter idea,
try to split the  into multiple files..

 site_.xml
where
  is the prefix of MD5( SSK@/CHK@ of the site )

take the MD5 of the key, but _NOT THE DOC PATH_.
This would have the following advantage:

- the file would compress better

- USK@ edition would be grouped together
* USK Edition based magics are easier.
* Words across multiple edition would look simliar,
  grouping means lessor site file to fetch


>
> MikeB



[freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Mike Bush
On Thu, 2009-06-11 at 21:25 +0800, Daniel Cheng wrote:
> On 11/6/2009 20:16, Mike Bush wrote:
> > 2009/6/10 Daniel Cheng:
> [...]
> >>
> >> This is yet another reason to split the  part out.
> >
> > I've built 2 indexes to find the space saving from separating keys
> > from words as well,
> >   for an index>  16000 keys with 256 subindices :
> >
> > The normal index with keys integrated in files>400MB
> > With keys in a separate key index(3MB) it totals 160MB
> >
> > Of course the difference wouldn't be so large if the index wasn't
> > separated into so many pieces.
> >
> > One thing I worried about was that the file index would get very
> > large, but even for the key index to be bigger than one of wanna's
> > subindexes it would contain>  32 keys. How many keys do very large
> > indexes have?
> 
> For a starter idea,
> try to split the  into multiple files..
> 
>  site_.xml
> where
>   is the prefix of MD5( SSK@/CHK@ of the site )
> 
> take the MD5 of the key, but _NOT THE DOC PATH_.
> This would have the following advantage:
> 
> - the file would compress better
> 
> - USK@ edition would be grouped together
> * USK Edition based magics are easier.
> * Words across multiple edition would look simliar,
>   grouping means lessor site file to fetch
> 

I would imagine that splitting the site index would be futile though, if
it was only split into a few, for example 16 files, a typical search
result of many hundreds of results would still require most parts. On
the other hand, a large number of splits would mean a smaller proportion
could be requested but the large number of requests would slow it
further.




[freenet-dev] Localising plugins

2009-06-11 Thread Matthew Toseland
On Thursday 11 June 2009 09:53:51 VolodyA! V Anarhist wrote:
> > Nextgens has pointed out that this is going to make it harder to build a 
> > proper plugin API with no shared code and only interfaces. Any suggestions 
> > for how to make this easy for plugins without them having to inherit 
> > classes from Freenet itself?
> 
> Why is it so bad to inherit something? Look at the design of something like
> Google Android, not only do you have to inherit whole bunch of stuff, but you
> are locked in the design pattern by it.

Because it makes isolating plugins from the node code via classloader hacks 
more difficult? Eventually we want to be able to support "untrusted" plugins...
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090611/e349ddc9/attachment.pgp>


[freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Mike Bush
2009/6/10 Daniel Cheng :
> On 10/6/2009 20:42, Mike Bush wrote:
>> 2009/6/10 Evan Daniel:
>>> On Wed, Jun 10, 2009 at 6:49 AM, Mike Bush ?wrote:
 XMLLibrarian doesn't currently support searching for phrases or rating
 relevance of results based on proximity so I don't think common words
 could be of any use in searches now.

 Also, I'm not sure but I think the current index doesn't include words
 under 4 letters at all.
>>> If you read my previous mails, you'll see that the the spider is in
>>> fact indexing the word "the".
>>>
>>
>> Yes sorry, Ive since searched for 'who' on wanna and it is there, it
>> gave me OutOfMemoryException trying to generate the results page
>>
>
> You have get it :)
>
> This is yet another reason to split the  part out.

I've built 2 indexes to find the space saving from separating keys
from words as well,
 for an index > 16000 keys with 256 subindices :

The normal index with keys integrated in files >400MB
With keys in a separate key index(3MB) it totals 160MB

Of course the difference wouldn't be so large if the index wasn't
separated into so many pieces.

One thing I worried about was that the file index would get very
large, but even for the key index to be bigger than one of wanna's
subindexes it would contain > 32 keys. How many keys do very large
indexes have?


MikeB


> In which we may keep in memory the siteId only, not the whole uri, before the 
> union.
>
> Even so, I suspect searching words like "the who" will ever work without on 
> disk temp files.
>
>>> Evan Daniel
>>>
> ___
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>



[freenet-dev] Trying to move forward on getting rid of emu

2009-06-11 Thread Arne Babenhauserheide
Am Montag, 8. Juni 2009 12:10:39 schrieb Florent Daigniere:
> No way. Bugzilla is everything but usable in our case.

OK. So it's Trac (with complex import but DVCS integration), Mantis (which 
some don't like) or an unfree solution. Did I miss one? 

I didn't yet include roundup, because I only saw today, that it does have the 
ability to handle dependencies, Also it has a optional commandline and XMLRPC 
interfaces. 

- http://www.roundup-tracker.org/

Best wishes, 
Arne

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- 
   - singing a part of the history of free software -
  http://infinite-hands.draketo.de

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090611/1287f644/attachment.pgp>


[freenet-dev] Localising plugins

2009-06-11 Thread VolodyA! V Anarhist
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> Nextgens has pointed out that this is going to make it harder to build a 
> proper plugin API with no shared code and only interfaces. Any suggestions 
> for how to make this easy for plugins without them having to inherit classes 
> from Freenet itself?

Why is it so bad to inherit something? Look at the design of something like
Google Android, not only do you have to inherit whole bunch of stuff, but you
are locked in the design pattern by it.

  - Volodya


- --
http://freedom.libsyn.com/   Echo of Freedom, Radical Podcast
http://eng.anarchopedia.org/ Anarchopedia, A Free Knowledge Portal
http://www.freedomporn.org/  Freedom Porn, anarchist and activist smut

 "None of us are free until all of us are free."~ Mihail Bakunin
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkowxhoACgkQuWy2EFICg+0I0wCaAxeOA2zMrK+kSTOyoihLwO3m
ozgAoMLL2AdvUQwnQQn2nRchPtFoxG6M
=/PcY
-END PGP SIGNATURE-



[freenet-dev] Anyone willing to translate the WinInstaller to German before 0.7.5 final?

2009-06-11 Thread Thomas Bruderer

> 1) It seems like you forgot the last 2 sections? ("; Service starter" 
> and "; Service stopper")
>   
Should have taken danish or italian version as a base, the french 
translation is incomplete, and I took it as my base. Will add it tomorrow.

Oui, le version fran?ais n'est pas fini!
> 2) The following translated lines are combined a bit too long according 
> to the space allocated for the text (I've put "[*]" where it hits max.. 
> I've also added a few spaces you missed after "mindestens" and "als"):
>   
The Problem with spaces and punctuation is, its not consistent, 
sometimes it starts with a space, sometimes it ends without a question 
mark even though its a question.

for 2/3. I'll shorten them accordingly, should be simple, It will 
probably be enough to simplify the composite nouns.
> I've committed the translation without the strings mentioned (the 
> installer will fallback to English for those). Feel free to simply send 
> the corrections to me and I'll submit those as well :).
>   
Will send the Update directly to you tomorrow.

Greetings, Apophis

-- next part --
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3537 bytes
Desc: S/MIME Cryptographic Signature
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090611/45aaf663/attachment.bin>


[freenet-dev] Anyone willing to translate the WinInstaller to German before 0.7.5 final?

2009-06-11 Thread Zero3
Thomas Bruderer skrev:
> Here is the German Translation.
> 
> What is the Schedule for 0.7.5?
> 
> The translation is done in polite Form - "Sie" and the orthography is 
> swiss (.i.e ss instead of ?)
> 
> Greetings,
> Apophis

Thanks a lot!

A few points:

1) It seems like you forgot the last 2 sections? ("; Service starter" 
and "; Service stopper")

2) The following translated lines are combined a bit too long according 
to the space allocated for the text (I've put "[*]" where it hits max.. 
I've also added a few spaces you missed after "mindestens" and "als"):

Trans_Add("Freenet requires at least ", "Freenet braucht mindestens ")
Trans_Add(" MB free disk space, but will not install with less than ", " 
MB freier Festplattenspeicher, aber wird nicht installiert mit weniger 
als ")
Trans_Add(" MB free. The amount of space reserved can be changed after 
installation.", " MB Festplattenspeicher. (Der Speicher der reserviert 
wird kann nach der[*] Installation ge?ndert werden).")

Can you shorten it down a couple of words? (e.g. ~30 chars)

3) Likewise, this translation is a bit too long:

Trans_Add("Freenet Installer has detected that you already have Freenet 
installed. Your current installation was installed using an older, 
unsupported installer. To continue, you must first uninstall your 
current version of Freenet using the previously created uninstaller:", 
"Der Freenetinstaller hat entdeckt, dass bereits eine Version von 
Freenet installiert ist. Ihre aktuelle Freenetinstallation wurde mit 
einem ?lteren, nicht mehr unterst?tzten, Installer gemacht. Um 
forzufahren m?ssen sie zun?chst Ihre aktuelle Freenetinstallation mit 
dem fr?her[*] erstellten Uninstaller deinstallieren:")

I've committed the translation without the strings mentioned (the 
installer will fallback to English for those). Feel free to simply send 
the corrections to me and I'll submit those as well :).

- Zero3



[freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Daniel Cheng
On 10/6/2009 18:04, Matthew Toseland wrote:
> On Wednesday 10 June 2009 06:54:03 Daniel Cheng wrote:
>> On Wed, Jun 10, 2009 at 12:02 PM, Evan Daniel  wrote:
>>> On my (incomplete) spider index, the index file for the word "the" (it
>>> indexes no other words) is 17MB.  This seems rather large.  It might
>>> make sense to have the spider not even bother creating an index on a
>>> handful of very common words (the, be, to, of, and, a, in, I, etc).
>>> Of course, this presents the occasional difficulty:
>>> http://bash.org/?514353  I think I'm in favor of not indexing common
>>> words even so.
>> Yes, it should ignore common words.
>> This is called "stopword" in search engine termology.
>>
>>> Also, on a related note, the index splitting policy should be a bit
>>> more sophisticated: in an attempt to fit within the max index size as
>>> configured, it split all the way down to index_8fc42.xml.  As a
>>> result, the file index_8fc4b.xml sits all by itself at 3KiB.  It
>>> contains the two words "vergessene" and "txjmnsm".  I suspect it would
>>> have reliability issues should anyone actually want to search either
>>> of those.  It would make more sense to have all of index_8fc4 in one
>>> file, since it would be only trivially larger.  (I have a patch that I
>>> thought did that, but it has a bug; I'll test once my indexwriter is
>>> finished writing, since I don't want to interrupt it by reloading the
>>> plugin.)
>> "trivially larger" ...
>> ugh... how trivial is trivial?
>>
>> the xmllibrarian can handle  index_8fc42.xml on its own but all other
>> 8fc4 on  index_8fc4.xml.
>> however, as i have stated in irc, that make index generation even slower.
>
> Why do the indexes have to have non-overlapping names? Can't we have both 
> index_8f and index_8fc42 ? And then when we fetch a term, use the appropriate 
> index by going for the one with the longest prefix?
>

We can.
In fact, XMLLibrarian handle this correctly.

It is just the spider part it is tricky.




[freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Daniel Cheng
On 10/6/2009 20:42, Mike Bush wrote:
> 2009/6/10 Evan Daniel:
>> On Wed, Jun 10, 2009 at 6:49 AM, Mike Bush  wrote:
>>> XMLLibrarian doesn't currently support searching for phrases or rating
>>> relevance of results based on proximity so I don't think common words
>>> could be of any use in searches now.
>>>
>>> Also, I'm not sure but I think the current index doesn't include words
>>> under 4 letters at all.
>> If you read my previous mails, you'll see that the the spider is in
>> fact indexing the word "the".
>>
>
> Yes sorry, Ive since searched for 'who' on wanna and it is there, it
> gave me OutOfMemoryException trying to generate the results page
>

You have get it :)

This is yet another reason to split the  part out.
In which we may keep in memory the siteId only, not the whole uri, before the 
union.

Even so, I suspect searching words like "the who" will ever work without on 
disk temp files.

>> Evan Daniel
>>



Re: [freenet-dev] Trying to move forward on getting rid of emu

2009-06-11 Thread Arne Babenhauserheide
Am Montag, 8. Juni 2009 12:10:39 schrieb Florent Daigniere:
 No way. Bugzilla is everything but usable in our case.

OK. So it's Trac (with complex import but DVCS integration), Mantis (which 
some don't like) or an unfree solution. Did I miss one? 

I didn't yet include roundup, because I only saw today, that it does have the 
ability to handle dependencies, Also it has a optional commandline and XMLRPC 
interfaces. 

- http://www.roundup-tracker.org/

Best wishes, 
Arne

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- 
   - singing a part of the history of free software -
  http://infinite-hands.draketo.de



signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Localising plugins

2009-06-11 Thread VolodyA! V Anarhist
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Nextgens has pointed out that this is going to make it harder to build a 
 proper plugin API with no shared code and only interfaces. Any suggestions 
 for how to make this easy for plugins without them having to inherit classes 
 from Freenet itself?

Why is it so bad to inherit something? Look at the design of something like
Google Android, not only do you have to inherit whole bunch of stuff, but you
are locked in the design pattern by it.

  - Volodya


- --
http://freedom.libsyn.com/   Echo of Freedom, Radical Podcast
http://eng.anarchopedia.org/ Anarchopedia, A Free Knowledge Portal
http://www.freedomporn.org/  Freedom Porn, anarchist and activist smut

 None of us are free until all of us are free.~ Mihail Bakunin
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkowxhoACgkQuWy2EFICg+0I0wCaAxeOA2zMrK+kSTOyoihLwO3m
ozgAoMLL2AdvUQwnQQn2nRchPtFoxG6M
=/PcY
-END PGP SIGNATURE-
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Mike Bush
2009/6/10 Daniel Cheng j16sdiz+free...@gmail.com:
 On 10/6/2009 20:42, Mike Bush wrote:
 2009/6/10 Evan Danieleva...@gmail.com:
 On Wed, Jun 10, 2009 at 6:49 AM, Mike Bushmpb...@gmail.com  wrote:
 XMLLibrarian doesn't currently support searching for phrases or rating
 relevance of results based on proximity so I don't think common words
 could be of any use in searches now.

 Also, I'm not sure but I think the current index doesn't include words
 under 4 letters at all.
 If you read my previous mails, you'll see that the the spider is in
 fact indexing the word the.


 Yes sorry, Ive since searched for 'who' on wanna and it is there, it
 gave me OutOfMemoryException trying to generate the results page


 You have get it :)

 This is yet another reason to split the site part out.

I've built 2 indexes to find the space saving from separating keys
from words as well,
 for an index  16000 keys with 256 subindices :

The normal index with keys integrated in files 400MB
With keys in a separate key index(3MB) it totals 160MB

Of course the difference wouldn't be so large if the index wasn't
separated into so many pieces.

One thing I worried about was that the file index would get very
large, but even for the key index to be bigger than one of wanna's
subindexes it would contain  32 keys. How many keys do very large
indexes have?


MikeB


 In which we may keep in memory the siteId only, not the whole uri, before the 
 union.

 Even so, I suspect searching words like the who will ever work without on 
 disk temp files.

 Evan Daniel

 ___
 Devl mailing list
 Devl@freenetproject.org
 http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Localising plugins

2009-06-11 Thread Matthew Toseland
On Thursday 11 June 2009 09:53:51 VolodyA! V Anarhist wrote:
  Nextgens has pointed out that this is going to make it harder to build a 
  proper plugin API with no shared code and only interfaces. Any suggestions 
  for how to make this easy for plugins without them having to inherit 
  classes from Freenet itself?
 
 Why is it so bad to inherit something? Look at the design of something like
 Google Android, not only do you have to inherit whole bunch of stuff, but you
 are locked in the design pattern by it.

Because it makes isolating plugins from the node code via classloader hacks 
more difficult? Eventually we want to be able to support untrusted plugins...


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Daniel Cheng
On 11/6/2009 20:16, Mike Bush wrote:
 2009/6/10 Daniel Chengj16sdiz+free...@gmail.com:
[...]

 This is yet another reason to split thesite  part out.

 I've built 2 indexes to find the space saving from separating keys
 from words as well,
   for an index  16000 keys with 256 subindices :

 The normal index with keys integrated in files400MB
 With keys in a separate key index(3MB) it totals 160MB

 Of course the difference wouldn't be so large if the index wasn't
 separated into so many pieces.

 One thing I worried about was that the file index would get very
 large, but even for the key index to be bigger than one of wanna's
 subindexes it would contain  32 keys. How many keys do very large
 indexes have?

For a starter idea,
try to split the site into multiple files..

 site_.xml
where
  is the prefix of MD5( SSK@/CHK@ of the site )

take the MD5 of the key, but _NOT THE DOC PATH_.
This would have the following advantage:

- the file would compress better

- USK@ edition would be grouped together
* USK Edition based magics are easier.
* Words across multiple edition would look simliar,
  grouping means lessor site file to fetch



 MikeB
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Mike Bush
On Thu, 2009-06-11 at 21:25 +0800, Daniel Cheng wrote:
 On 11/6/2009 20:16, Mike Bush wrote:
  2009/6/10 Daniel Chengj16sdiz+free...@gmail.com:
 [...]
 
  This is yet another reason to split thesite  part out.
 
  I've built 2 indexes to find the space saving from separating keys
  from words as well,
for an index  16000 keys with 256 subindices :
 
  The normal index with keys integrated in files400MB
  With keys in a separate key index(3MB) it totals 160MB
 
  Of course the difference wouldn't be so large if the index wasn't
  separated into so many pieces.
 
  One thing I worried about was that the file index would get very
  large, but even for the key index to be bigger than one of wanna's
  subindexes it would contain  32 keys. How many keys do very large
  indexes have?
 
 For a starter idea,
 try to split the site into multiple files..
 
  site_.xml
 where
   is the prefix of MD5( SSK@/CHK@ of the site )
 
 take the MD5 of the key, but _NOT THE DOC PATH_.
 This would have the following advantage:
 
 - the file would compress better
 
 - USK@ edition would be grouped together
 * USK Edition based magics are easier.
 * Words across multiple edition would look simliar,
   grouping means lessor site file to fetch
 

I would imagine that splitting the site index would be futile though, if
it was only split into a few, for example 16 files, a typical search
result of many hundreds of results would still require most parts. On
the other hand, a large number of splits would mean a smaller proportion
could be requested but the large number of requests would slow it
further.

___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Should the spider ignore common words?

2009-06-11 Thread Daniel Cheng
On Thu, Jun 11, 2009 at 10:15 PM, Mike Bushmpb...@gmail.com wrote:
 On Thu, 2009-06-11 at 21:25 +0800, Daniel Cheng wrote:
 On 11/6/2009 20:16, Mike Bush wrote:
  2009/6/10 Daniel Chengj16sdiz+free...@gmail.com:
 [...]
 
  This is yet another reason to split thesite  part out.
 
  I've built 2 indexes to find the space saving from separating keys
  from words as well,
    for an index  16000 keys with 256 subindices :
 
  The normal index with keys integrated in files400MB
  With keys in a separate key index(3MB) it totals 160MB
 
  Of course the difference wouldn't be so large if the index wasn't
  separated into so many pieces.
 
  One thing I worried about was that the file index would get very
  large, but even for the key index to be bigger than one of wanna's
  subindexes it would contain  32 keys. How many keys do very large
  indexes have?

 For a starter idea,
 try to split the site into multiple files..

      site_.xml
 where
       is the prefix of MD5( SSK@/CHK@ of the site )

 take the MD5 of the key, but _NOT THE DOC PATH_.
 This would have the following advantage:

     - the file would compress better

     - USK@ edition would be grouped together
         * USK Edition based magics are easier.
         * Words across multiple edition would look simliar,
           grouping means lessor site file to fetch


 I would imagine that splitting the site index would be futile though, if
 it was only split into a few, for example 16 files, a typical search
 result of many hundreds of results would still require most parts. On
 the other hand, a large number of splits would mean a smaller proportion
 could be requested but the large number of requests would slow it
 further.


Possible Sol'n:
(NOTE TO mikeb: You don't have to implement this in this version;
 this can be another summer :))

   * Load them lazily.
   - Splitting the result into pages.
   - Include all the stats related to ranking in the keyword index file,
  that would be the term position.  so we can do TF-IDF kind of ranking
   - Prefetch the site files for other pages in the background.
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


[freenet-dev] Win Installer it_IT l10n 090611

2009-06-11 Thread Luke771

Attached.
Please commit.
;
; Translation template
;
; Quick guide to translating:
; 1.  Save this file as Include_Lang_xx.inc (xx being a standard 2-letter 
lowercase language code, e.g. en for English or da for Danish)
; 2.  Replace xx with the same language code in LoadLanguage_xx() below.
; 3.  Translate! Format is: Trans_Add(original text, translated text). 
Please do not leave empty strings (comment them out instead so the installer 
will fall back to English for those strings).
; 4.  Submit the translation to SVN or the developer mailing list
; 4.a  Please make sure that either yourself or another developer tests the 
translation for obvious layout glitches and other bugs (simply launching the 
installer and verifying that the main GUI looks OK should be enough most of the 
time)
; 4.b  On the first submission, make sure that the translation is added to 
Include_TranslationHelper.ahk or nothing will happen ;)
;
; General note about translation:
; Because of the compact GUI design, much of the text are subject to min/max 
size limitations. A too short translation will leave holes in the GUI and
; a too long will mess up the layout. So try to keep the translations at the 
approx. same length as the original English text, with the approx. same
; placement of any newline markers. The easiest way to test the translation is 
to compile the installer and take a look around. The installer runs
; under WINE, but because of WINE bugs the layout will *not* be completely true 
to a real Windows installation.
;
 
LoadLanguage_it()
{
  ; Installer - Common
  Trans_Add(Freenet Installer, Programma di Installazione di Freenet)
  Trans_Add(Welcome to the Freenet Installer!, Benvenuti nel programma di 
installazione di Freenet)
  Trans_Add(Installation Problem, Problema nell'installazione)
  Trans_Add(Freenet Installer fatal error, Errore irreparabile nel programma 
di installazione di Freenet)
  Trans_Add(Freenet Installer error, Errore nel programma di installazione 
di Freenet)
  Trans_Add(Error: , Errore)
  Trans_Add(Exit, Esci)
 
  ; Installer - Error messageboxes
  Trans_Add(Freenet Installer was not able to unpack necessary installation 
files to:, non è stato possibile estrarre i necessari file di installzione 
in:)
  Trans_Add(Please make sure that Freenet Installer has full access to the 
system's temporary files folder., Si prega di accertarsi che il Programma di 
Installazione di Freenet abbia pieno accesso alla cartella dei file temporanei 
del sistema.)
  Trans_Add(Freenet Installer requires administrator privileges to install 
Freenet.`nPlease make sure that your user account has administrative access to 
the system, Il Programma di Installazione di Freenet necessita di privilegi 
amministrativi per poter installare Freenet`n Si prega di accertarsi che 
l'account in uso abbia accesso amministrativo al sistema,)
  Trans_Add(Freenet Installer was not able to write to the selected 
installation directory.`nPlease select one to which you have write access., 
Il programma di installazione di Freenet non è riuscito ad iscrivere i dati 
nella cartella selezionata.`nSi prega di selezionare una cartella alla quale 
nella quale sia possibile scrivere)
  Trans_Add(Freenet Installer was not able to find a free port on your system 
in the range , Il programma di installazione di Freenet non è riuscito a 
rilevare una porta libera sul sistema in range )
  Trans_Add(Please free a system port in this range to install Freenet., Si 
prega di liberare una porta in questo segmento per poter installare Freenet)
  Trans_Add(Freenet Installer was not able to create a Winsock 2.0 socket`nfor 
port availability testing., Non è stato possibile creare un socket Winsock 
2.0`n per il testing disponibilità porte)
 
  ; Installer - Unsupported Windows version
  Trans_Add(Freenet only supports the following versions of the Windows 
operating system:, Freenet supporta soltanto le seguenti versioni del sistema 
operativo Windows)
  Trans_Add(Please install one of these versions if you want to use Freenet on 
Windows., Si prega di installare una di queste versioni se si vuole usare 
Freenet su Windows)
 
  ; Installer - Java missing
  Trans_Add(Freenet requires the Java Runtime Environment, but your system 
does not appear to have an up-to-date version installed. You can install Java 
by using the included online installer, which will download and install the 
necessary files from the Java website automatically:, Freenet necessita di 
Java Runtime Environment per poter funzionare, ma il sistema non sembra 
disporre di una versione aggiornata. E' possibile installare Java usando il 
programma di installazione on-line integrato, il quale scaricherà i file 
necessari dal sito web di Java e li installerà automaticamente.)
  Trans_Add(Install Java, Installa Java)
  Trans_Add(The installation will continue once Java version , 
L'installazione continuerà dopo che Java versione)
  Trans_Add( or later has been installed., o posteriore 

Re: [freenet-dev] Localising plugins

2009-06-11 Thread Mike Bush
On Thu, 2009-06-11 at 13:35 +0100, Matthew Toseland wrote:
 On Thursday 11 June 2009 09:53:51 VolodyA! V Anarhist wrote:
   Nextgens has pointed out that this is going to make it harder to build a 
   proper plugin API with no shared code and only interfaces. Any 
   suggestions for how to make this easy for plugins without them having to 
   inherit classes from Freenet itself?
  
  Why is it so bad to inherit something? Look at the design of something like
  Google Android, not only do you have to inherit whole bunch of stuff, but 
  you
  are locked in the design pattern by it.
 
 Because it makes isolating plugins from the node code via classloader hacks 
 more difficult? Eventually we want to be able to support untrusted 
 plugins...

I would like to localise the search interface, should I wait until a
plan is made for how to do this or just implement a quick method such as
array of keys and arrays for languages. There aren't many strings so i
suppose it wouldnt be a problem.

___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl