Re: [freenet-dev] Disk I/O thread

2012-09-06 Thread postwall-free...@yahoo.de
Freetalk and WoT may be better designed on that level. However they 
produce more disk I/O. And the reason for this is, mainstream database 
practice and theory are designed for two cases:
1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
hardware with professional sysadmins.
2. Tiny amounts of data on commodity hardware.

If
 you have lots of data on commodity disks, it falls down. And note that 
mainstream use of databases in the second case frequently fails. For 
example, I have a huge set of bookmarks in a tree on one of my browsers.
 Every time I create a new category, another one gets renamed.

Thats not a database error, thats an application one for sure. The reason 
databases fail on standard consumer hardware is because these systems often 
write buffer on various levels, undermining the whole ACID architecture. This 
leads to half-applied transactions or simply database corruption on power 
failure or even just os crashes. The results are random, though the standard 
one being a corrupt db which just doesn't get back up, any replicable behaviour 
isn't likely to be related to this. 

Apart from that databases of any size will run reliably on any hardware as long 
as you ensure fsync does what it is supposed to do and don't intentionally or 
unintentionally disable any of the transaction-based safety features. 
Performance will suffer on consumer hardware, greatly so if the indices are too 
big to be cached, but identical stuff runs slower on slower hardware, no 
surprise here. It doesn't mean that there is no point in running large DBs on 
consumer hardware or that it is somehow inherently unreliable and you end up 
with randomly modified data sets.



 The approach of fred to not use it is just wrong from a computer-science 
 perspective.
 
 The fact that it does not perform so well is an implementation issue of the 
 database, not of the client code which uses the database.

No it is not. The load you put on it requires many seeks.

Are you sure of that?  The primary database will require seeking on pretty much 
any access due to its nature, alright, but it doesn't need to be accessed all 
that often actually. How many requests must a busy node handle per second? 
Maybe 10, if at all and those will be lucky to cause 10 ios. Your standard sata 
disk can deal with more then that, the standard figures are usually 50 - 100 
iops. Write requests to the DBs, caused by insert requests or fetched unknown 
data, which will require multiple seeks, will drive the load up a bit but if 
the write per second to db figures of my node are any indication then this 
won't cause severe disk load either, it just doesn't occure often enough.


This is also my general experience with freenet, the main DBs aren't the issue: 
Just keep it running on its own and there won't be any issues, even with 500+ 
GB stores on old 5400 rpm sata disks. Load the node with local requests, be it 
downloads or something else, and the whole things gets severly disk-limited 
very very fast. With stuff like WoT, Freetalk or the Spider it is even worse, 
these can easily get things to unbearable levels in my experience.


So imho the offender is rather the db4o database which is basically used by 
clients. The load here is different though, I don't really see how it must 
strictly be seek-heavy for long durations of time. Splitfile handling for 
example is more of a bulk data thing. Ideally it should get the data out of the 
db as needed during a decode, keeping temporary results in memory or in 
tempfiles if needed and only store the final result back to the db, minimising 
writes. This is still the equivalent of reading a terribly fragmented file from 
disk but thats not an unsurmountable task for a standard disk. But considering 
how io limited splitfile decoding often is on standard disks, and the time it 
takes, I would really suspect that it loads the DB with uneeded stuff, trying 
to minimise memory footprint, temp-file usage or something like that. 
Espescially since just the data retrieval, before the request actually 
completes, is already often io heavy in my experience.

The same goes for Freetalk  Co: They can't really cause many transactions per 
second, since freenet just can't fetch data fast enough to cause enough 
changes, so either they run really complex queries on really large or complex 
data sets or the db is inefficiently constructed or inefficiently accessed. No 
idea but it certainly seems weird.


I suspect in general that the db4o database is loaded with way too many writes 
per second which is what will kill DB performance fast. Even your average 
professionally run DB, which you talk about above, quite often runs on a raid 5 
or even raid 6 array out of 10k rpm disks, simply because 
thats a cost effective way to store bulky stuff, and cost is saved everywhere 
if at all possible. Those arrays will 
happily deal with very high read iops compared to your standard consumer disk 
but 

[freenet-dev] Logging in Fred

2012-03-28 Thread postwall-free...@yahoo.de
If I remember right then the main issue with debug logging wasn't GC but that 
generating some of the log string arguments in freenet was simply very resource 
intensive in itself. So if Java isn't somehow intelligent enough to detect that 
argument x isn't needed since it won't be used in the end anyway and thus 
doesn't evaluate it before passing the value, then removing the current logic 
will cause higher ressource usage once again.

In 'log.info("Random message: %s", obj.toString())' evaluating 'obj.toString()' 
was what caused issues, not recycling it, or so I believe to remember.


The main things causing GC issues which I remember were not reusing threads and 
spawning a few new objects per received and send packet, not the log stuff, 
that probably just tipped things over the edge when GC was already done heavily 
because of the former.


Speeking out of memory as a non Java-dev here, so ignore if applicable. :P




 Von: Ximin Luo 
An: devl at freenetproject.org 
Gesendet: 4:33 Samstag, 24.M?rz 2012
Betreff: Re: [freenet-dev] Logging in Fred

I see.

Actually we didn't even need the side effects thing - I did notice that your
example stopped doing GC when setting "shouldLog = false" always, so I tried to
introduce your "if (shouldLog) { shouldLog = false; }" into my example, but was
still unable to reproduce GC. Then changing

- System.out.println("don't optimise me out");
+ System.out.println(format + args);

worked to re-introduce GC.

Well, that sucks. I guess the escape analyzer isn't sophisticated enough to see
that ar2.toString() does not cause ar2 to escape - and removing that from your
log() method does result in no GC.

I still think it's negligible though :p - for my test, I'm seeing only ~8ms
lost to GC after 20 million calls, and this appears to not change by much even
if I increase the number of arguments (and objects created) to log_info.

On 24/03/12 03:02, Zlatin Balevsky wrote:
> Your test has nothing to do with stack allocation because you never
> use the objects that you pass to the log_info method so JIT removes
> them.? Apply the following patch and see yourself create garbage in
> both testWith and testWithout.
> 
> --- Test2.java??? 2012-03-23 23:01:05.540475497 -0400
> +++ Test.java??? 2012-03-23 23:01:47.304477562 -0400
> @@ -6,8 +6,11 @@
>? ? final Object existing = new Object();
>? ? Object tmp = null;
> 
> +? long sideEffect;
>? ? public void log_info(String format, Object ... args) {
>? ? ? if (log_level) {
> +? ? ? for (Object o : args)
> +??? sideEffect += o.hashCode();
>? ? ? ? System.out.println("don't optimise me out");
>? ? ? }
>? ? }
> @@ -88,11 +91,8 @@
>? ? ? printSummary("with alloc", res[0], "w/o alloc", res[1]);
>? ? ? printSummary("with alloc", res[2], "w/o alloc", res[3]);
>? ? ? printSummary("with alloc", res[4], "w/o alloc", res[5]);
> -? ? System.out.println(" with stack allocation and \"real use case\"");
> -? ? res = testWithOtherCode(z);
> -? ? printSummary("log only", res[0], "log+work", res[1]);
> -? ? printSummary("log only", res[2], "log+work", res[3]);
> -? ? printSummary("log only", res[4], "log+work", res[5]);
> +
> +? ? System.out.println(sideEffect);
>? ? }
> 
>? ? public static void main(String[] args) {
> 
> 
> On Fri, Mar 23, 2012 at 10:48 PM, Ximin Luo  wrote:
>> On 24/03/12 02:44, Zlatin Balevsky wrote:
>>> Wrong.? Extracting the log statement into its own method and calling
>>> that method inside the while loop still creates garbage.? Stack frames
>>> are different.
>>>
>>
>> How do you explain the results obtained from running my test code, then? Turn
>> on -verbose:gc. testWithStackAllocation results in no GC,
>> testWithoutStackAllocation results in lots of GC.
>>
>>> Read again exactly how Escape Analysis works.? The objects it prevents
>>> from creating do not appear in heap histograms.? Take a "jmap -histo
>>> " and you will see a huge number of java.lang.Integer objects in
>>> the heap, not on the stack.
>>>
>>
>> If you're still talking about your example, this is exactly consistent with 
>> my
>> explanation, i.e. that escape analysis is NOT occurring, properly anyway.
>>
>>> The difference in run times of your example could be due to many
>>> things.? How many iterations did you run?? It may be still in
>>> interpreted mode.? Enable -XX:+PrintCompilation and discard any
>>> results until JIT stops printing.
>>
>> Done that, JIT stops printing after the 3rd iteration and results are the 
>> same
>> thereafter.
>>
>> --
>> GPG: 4096R/5FBBDBCE
>> https://github.com/infinity0
>> https://bitbucket.org/infinity0
>> https://launchpad.net/~infinity0
>>
>>
>> ___
>> Devl mailing list
>> Devl at freenetproject.org
>> https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
> ___
> Devl mailing list
> Devl at freenetproject.org
> https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


-- 
GPG: 

Re: [freenet-dev] Logging in Fred

2012-03-28 Thread postwall-free...@yahoo.de
If I remember right then the main issue with debug logging wasn't GC but that 
generating some of the log string arguments in freenet was simply very resource 
intensive in itself. So if Java isn't somehow intelligent enough to detect that 
argument x isn't needed since it won't be used in the end anyway and thus 
doesn't evaluate it before passing the value, then removing the current logic 
will cause higher ressource usage once again.

In 'log.info(Random message: %s, obj.toString())' evaluating 'obj.toString()' 
was what caused issues, not recycling it, or so I believe to remember.


The main things causing GC issues which I remember were not reusing threads and 
spawning a few new objects per received and send packet, not the log stuff, 
that probably just tipped things over the edge when GC was already done heavily 
because of the former.


Speeking out of memory as a non Java-dev here, so ignore if applicable. :P




 Von: Ximin Luo infini...@gmx.com
An: devl@freenetproject.org 
Gesendet: 4:33 Samstag, 24.März 2012
Betreff: Re: [freenet-dev] Logging in Fred
 
I see.

Actually we didn't even need the side effects thing - I did notice that your
example stopped doing GC when setting shouldLog = false always, so I tried to
introduce your if (shouldLog) { shouldLog = false; } into my example, but was
still unable to reproduce GC. Then changing

- System.out.println(don't optimise me out);
+ System.out.println(format + args);

worked to re-introduce GC.

Well, that sucks. I guess the escape analyzer isn't sophisticated enough to see
that ar2.toString() does not cause ar2 to escape - and removing that from your
log() method does result in no GC.

I still think it's negligible though :p - for my test, I'm seeing only ~8ms
lost to GC after 20 million calls, and this appears to not change by much even
if I increase the number of arguments (and objects created) to log_info.

On 24/03/12 03:02, Zlatin Balevsky wrote:
 Your test has nothing to do with stack allocation because you never
 use the objects that you pass to the log_info method so JIT removes
 them.  Apply the following patch and see yourself create garbage in
 both testWith and testWithout.
 
 --- Test2.java    2012-03-23 23:01:05.540475497 -0400
 +++ Test.java    2012-03-23 23:01:47.304477562 -0400
 @@ -6,8 +6,11 @@
    final Object existing = new Object();
    Object tmp = null;
 
 +  long sideEffect;
    public void log_info(String format, Object ... args) {
      if (log_level) {
 +      for (Object o : args)
 +    sideEffect += o.hashCode();
        System.out.println(don't optimise me out);
      }
    }
 @@ -88,11 +91,8 @@
      printSummary(with alloc, res[0], w/o alloc, res[1]);
      printSummary(with alloc, res[2], w/o alloc, res[3]);
      printSummary(with alloc, res[4], w/o alloc, res[5]);
 -    System.out.println( with stack allocation and \real use case\);
 -    res = testWithOtherCode(z);
 -    printSummary(log only, res[0], log+work, res[1]);
 -    printSummary(log only, res[2], log+work, res[3]);
 -    printSummary(log only, res[4], log+work, res[5]);
 +
 +    System.out.println(sideEffect);
    }
 
    public static void main(String[] args) {
 
 
 On Fri, Mar 23, 2012 at 10:48 PM, Ximin Luo infini...@gmx.com wrote:
 On 24/03/12 02:44, Zlatin Balevsky wrote:
 Wrong.  Extracting the log statement into its own method and calling
 that method inside the while loop still creates garbage.  Stack frames
 are different.


 How do you explain the results obtained from running my test code, then? Turn
 on -verbose:gc. testWithStackAllocation results in no GC,
 testWithoutStackAllocation results in lots of GC.

 Read again exactly how Escape Analysis works.  The objects it prevents
 from creating do not appear in heap histograms.  Take a jmap -histo
 pid and you will see a huge number of java.lang.Integer objects in
 the heap, not on the stack.


 If you're still talking about your example, this is exactly consistent with 
 my
 explanation, i.e. that escape analysis is NOT occurring, properly anyway.

 The difference in run times of your example could be due to many
 things.  How many iterations did you run?  It may be still in
 interpreted mode.  Enable -XX:+PrintCompilation and discard any
 results until JIT stops printing.

 Done that, JIT stops printing after the 3rd iteration and results are the 
 same
 thereafter.

 --
 GPG: 4096R/5FBBDBCE
 https://github.com/infinity0
 https://bitbucket.org/infinity0
 https://launchpad.net/~infinity0


 ___
 Devl mailing list
 Devl@freenetproject.org
 https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
 ___
 Devl mailing list
 Devl@freenetproject.org
 https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


-- 
GPG: 4096R/5FBBDBCE
https://github.com/infinity0
https://bitbucket.org/infinity0
https://launchpad.net/~infinity0