Re: [freenet-dev] Disk I/O thread

2012-09-11 Thread Matthew Toseland
On Friday 07 Sep 2012 00:43:44 postwall-free...@yahoo.de wrote:
 Freetalk and WoT may be better designed on that level. However they 
 produce more disk I/O. And the reason for this is, mainstream database 
 practice and theory are designed for two cases:
 1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
 hardware with professional sysadmins.
 2. Tiny amounts of data on commodity hardware.
 
 If
  you have lots of data on commodity disks, it falls down. And note that 
 mainstream use of databases in the second case frequently fails. For 
 example, I have a huge set of bookmarks in a tree on one of my browsers.
  Every time I create a new category, another one gets renamed.
 
 Thats not a database error, thats an application one for sure. The reason 
 databases fail on standard consumer hardware is because these systems often 
 write buffer on various levels, undermining the whole ACID architecture. This 
 leads to half-applied transactions or simply database corruption on power 
 failure or even just os crashes. The results are random, though the standard 
 one being a corrupt db which just doesn't get back up, any replicable 
 behaviour isn't likely to be related to this. 
 
 Apart from that databases of any size will run reliably on any hardware as 
 long as you ensure fsync does what it is supposed to do and don't 
 intentionally or unintentionally disable any of the transaction-based safety 
 features. Performance will suffer on consumer hardware, greatly so if the 
 indices are too big to be cached, but identical stuff runs slower on slower 
 hardware, no surprise here. It doesn't mean that there is no point in running 
 large DBs on consumer hardware or that it is somehow inherently unreliable 
 and you end up with randomly modified data sets.
 
 
 
  The approach of fred to not use it is just wrong from a computer-science 
  perspective.
  
  The fact that it does not perform so well is an implementation issue of 
  the 
  database, not of the client code which uses the database.
 
 No it is not. The load you put on it requires many seeks.
 
 Are you sure of that?  The primary database will require seeking on pretty 
 much any access due to its nature, alright, but it doesn't need to be 
 accessed all that often actually. How many requests must a busy node handle 
 per second? Maybe 10, if at all and those will be lucky to cause 10 ios. Your 
 standard sata disk can deal with more then that, the standard figures are 
 usually 50 - 100 iops. Write requests to the DBs, caused by insert requests 
 or fetched unknown data, which will require multiple seeks, will drive the 
 load up a bit but if the write per second to db figures of my node are any 
 indication then this won't cause severe disk load either, it just doesn't 
 occure often enough.
 
 
 This is also my general experience with freenet, the main DBs aren't the 
 issue: Just keep it running on its own and there won't be any issues, even 
 with 500+ GB stores on old 5400 rpm sata disks. Load the node with local 
 requests, be it downloads or something else, and the whole things gets 
 severly disk-limited very very fast. With stuff like WoT, Freetalk or the 
 Spider it is even worse, these can easily get things to unbearable levels in 
 my experience.
 
 
 So imho the offender is rather the db4o database which is basically used by 
 clients. The load here is different though, I don't really see how it must 
 strictly be seek-heavy for long durations of time. Splitfile handling for 
 example is more of a bulk data thing. Ideally it should get the data out of 
 the db as needed during a decode, keeping temporary results in memory or in 
 tempfiles if needed and only store the final result back to the db, 
 minimising writes. This is still the equivalent of reading a terribly 
 fragmented file from disk but thats not an unsurmountable task for a standard 
 disk. But considering how io limited splitfile decoding often is on standard 
 disks, and the time it takes, I would really suspect that it loads the DB 
 with uneeded stuff, trying to minimise memory footprint, temp-file usage or 
 something like that. Espescially since just the data retrieval, before the 
 request actually completes, is already often io heavy in my experience.
 
 The same goes for Freetalk  Co: They can't really cause many transactions 
 per second, since freenet just can't fetch data fast enough to cause enough 
 changes, so either they run really complex queries on really large or complex 
 data sets or the db is inefficiently constructed or inefficiently accessed. 
 No idea but it certainly seems weird.
 
 
 I suspect in general that the db4o database is loaded with way too many 
 writes per second which is what will kill DB performance fast. Even your 
 average professionally run DB, which you talk about above, quite often runs 
 on a raid 5 or even raid 6 array out of 10k rpm disks, simply because 
 thats a cost effective way to store bulky 

Re: [freenet-dev] Disk I/O thread

2012-09-06 Thread postwall-free...@yahoo.de
 as it automatically 
repairs by starting from scratch of course), people will scream murder if 
downloads vanish, uploads corrupt and so on, which is likely to happen with any 
hand-crafted storage solution, espescially on flakey hardware and in general at 
first. If a dbm worth its name doesn't manage to hold on to its data, anything 
hand-crafted won't either. 

Please don't go down that road, db4o may be slow, but at least it works 
somewhat reliably, although less so then one might want, but slow storage is 
still much better then unreliable storage for anything but data one doesn't 
really care about anyway. Switch to another not object-oriented dbm if thats 
needed but please don't go back to some hand-crafted stuff which will work in 
release x, randomly corrupt in release y and fill your disk in release z. If 
this release cycle also holds true for db4o btw, then thats a pretty horrible 
result for a dbm.


In terms of speeds it boils down to this anyway: If the db needs to cause 
seek-heavy IO to fullfill a query, then any hand-crafted storage will have to 
do the same, assuming the db and query aren't build inefficiently. If there is 
much to gain by using a hand-crafted storage, then there should be room for 
improvement in the use of the db, too. For example by not storing temporary 
results in the db, like temporary results during splitfile decoding, but just 
starting the calculation from scratch if the node really crashes.



Hm, this email got pretty long in the end. Sorry for that - end of rant.




 Von: Matthew Toseland t...@amphibian.dyndns.org
An: Discussion of development issues devl@freenetproject.org 
Gesendet: 12:32 Dienstag, 4.September 2012
Betreff: Re: [freenet-dev] Disk I/O thread
 
On Sunday 02 Sep 2012 17:51:49 xor wrote:
 On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
  Sadly Freetalk/WoT do use rollback so has to
  commit EVERY TIME. 
 
 I oppose to the sadly. Transaction-based programming has proven to be a 
 valid approach to solve many issues of traditional manually undo everything 
 upon error-programming.

Lets get one thing clear to begin with: I am not advocating that Freetalk/WoT 
aggregate transactions by simply not committing. The basic design flaw in 
Fred's use of the database layer was to assume that object databases work as 
advertised and you can simply take a big complex in-memory structure and 
persist it more or less transparently, and then add a load of (de)activation 
code and reduce memory usage.

Freetalk and WoT may be better designed on that level. However they produce 
more disk I/O. And the reason for this is, mainstream database practice and 
theory are designed for two cases:
1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
hardware with professional sysadmins.
2. Tiny amounts of data on commodity hardware.

If you have lots of data on commodity disks, it falls down. And note that 
mainstream use of databases in the second case frequently fails. For example, I 
have a huge set of bookmarks in a tree on one of my browsers. Every time I 
create a new category, another one gets renamed.

 The approach of fred to not use it is just wrong from a computer-science 
 perspective.
 
 The fact that it does not perform so well is an implementation issue of the 
 database, not of the client code which uses the database.

No it is not. The load you put on it requires many seeks.

Of course, it might require fewer seeks if it was using a well-designed SQL 
schema rather than trying to store objects. And I'm only talking about writes 
here; obviously if everything doesn't fit in memory you need to do seeks on 
read as well, and they can be very involved given db4o's lack of two-column 
indexes. However, because of the constant fsync's, we still needs loads of 
seeks *even if the whole thing fits in the OS disk cache*!
 
 Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong 
 algorithms, so we are not even at the point where we can tell whether their 
 ACID-usage is a problem. There is too much noise generated from the highly 
 inefficient algorithms which are being used by them, we cannot measure the 
 performance impact of commit/rollback.

That's possible.

However, the bottom line is if you have to commit every few seconds, you have 
to fsync every few seconds. IMHO avoiding that problem, for instance by turning 
off fsync and making periodic backups, would dramatically improve performance.

As regards Freenet I am leaning towards a hand-coded on-disk structure. We 
don't use queries anyway, and mostly we don't need them; most of the data could 
be handled as a series of flat-files, and it would be far more robust than 
db4o, and likely faster too. But the first thing to do is upgrade the database 
and see whether it 1) breaks even worse or 2) fixes the problems. Past testing 
suggests #1, but past testing was based on later versions of 7.4, not on the 
latest 7.12

[freenet-dev] Disk I/O thread

2012-09-04 Thread Matthew Toseland
On Sunday 02 Sep 2012 17:51:49 xor wrote:
> On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
> > Sadly Freetalk/WoT do use rollback so has to
> > commit EVERY TIME. 
> 
> I oppose to the "sadly". Transaction-based programming has proven to be a 
> valid approach to solve many issues of traditional "manually undo everything 
> upon error"-programming.

Lets get one thing clear to begin with: I am not advocating that Freetalk/WoT 
aggregate transactions by simply not committing. The basic design flaw in 
Fred's use of the database layer was to assume that object databases work as 
advertised and you can simply take a big complex in-memory structure and 
persist it more or less transparently, and then add a load of (de)activation 
code and reduce memory usage.

Freetalk and WoT may be better designed on that level. However they produce 
more disk I/O. And the reason for this is, mainstream database practice and 
theory are designed for two cases:
1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
hardware with professional sysadmins.
2. Tiny amounts of data on commodity hardware.

If you have lots of data on commodity disks, it falls down. And note that 
mainstream use of databases in the second case frequently fails. For example, I 
have a huge set of bookmarks in a tree on one of my browsers. Every time I 
create a new category, another one gets renamed.

> The approach of fred to not use it is just wrong from a computer-science 
> perspective.
> 
> The fact that it does not perform so well is an implementation issue of the 
> database, not of the client code which uses the database.

No it is not. The load you put on it requires many seeks.

Of course, it might require fewer seeks if it was using a well-designed SQL 
schema rather than trying to store objects. And I'm only talking about writes 
here; obviously if everything doesn't fit in memory you need to do seeks on 
read as well, and they can be very involved given db4o's lack of two-column 
indexes. However, because of the constant fsync's, we still needs loads of 
seeks *even if the whole thing fits in the OS disk cache*!
> 
> Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong 
> algorithms, so we are not even at the point where we can tell whether their 
> ACID-usage is a problem. There is too much noise generated from the highly 
> inefficient algorithms which are being used by them, we cannot measure the 
> performance impact of commit/rollback.

That's possible.

However, the bottom line is if you have to commit every few seconds, you have 
to fsync every few seconds. IMHO avoiding that problem, for instance by turning 
off fsync and making periodic backups, would dramatically improve performance.

As regards Freenet I am leaning towards a hand-coded on-disk structure. We 
don't use queries anyway, and mostly we don't need them; most of the data could 
be handled as a series of flat-files, and it would be far more robust than 
db4o, and likely faster too. But the first thing to do is upgrade the database 
and see whether it 1) breaks even worse or 2) fixes the problems. Past testing 
suggests #1, but past testing was based on later versions of 7.4, not on the 
latest 7.12.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 



Re: [freenet-dev] Disk I/O thread

2012-09-04 Thread Matthew Toseland
On Sunday 02 Sep 2012 17:51:49 xor wrote:
 On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
  Sadly Freetalk/WoT do use rollback so has to
  commit EVERY TIME. 
 
 I oppose to the sadly. Transaction-based programming has proven to be a 
 valid approach to solve many issues of traditional manually undo everything 
 upon error-programming.

Lets get one thing clear to begin with: I am not advocating that Freetalk/WoT 
aggregate transactions by simply not committing. The basic design flaw in 
Fred's use of the database layer was to assume that object databases work as 
advertised and you can simply take a big complex in-memory structure and 
persist it more or less transparently, and then add a load of (de)activation 
code and reduce memory usage.

Freetalk and WoT may be better designed on that level. However they produce 
more disk I/O. And the reason for this is, mainstream database practice and 
theory are designed for two cases:
1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
hardware with professional sysadmins.
2. Tiny amounts of data on commodity hardware.

If you have lots of data on commodity disks, it falls down. And note that 
mainstream use of databases in the second case frequently fails. For example, I 
have a huge set of bookmarks in a tree on one of my browsers. Every time I 
create a new category, another one gets renamed.

 The approach of fred to not use it is just wrong from a computer-science 
 perspective.
 
 The fact that it does not perform so well is an implementation issue of the 
 database, not of the client code which uses the database.

No it is not. The load you put on it requires many seeks.

Of course, it might require fewer seeks if it was using a well-designed SQL 
schema rather than trying to store objects. And I'm only talking about writes 
here; obviously if everything doesn't fit in memory you need to do seeks on 
read as well, and they can be very involved given db4o's lack of two-column 
indexes. However, because of the constant fsync's, we still needs loads of 
seeks *even if the whole thing fits in the OS disk cache*!
 
 Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong 
 algorithms, so we are not even at the point where we can tell whether their 
 ACID-usage is a problem. There is too much noise generated from the highly 
 inefficient algorithms which are being used by them, we cannot measure the 
 performance impact of commit/rollback.

That's possible.

However, the bottom line is if you have to commit every few seconds, you have 
to fsync every few seconds. IMHO avoiding that problem, for instance by turning 
off fsync and making periodic backups, would dramatically improve performance.

As regards Freenet I am leaning towards a hand-coded on-disk structure. We 
don't use queries anyway, and mostly we don't need them; most of the data could 
be handled as a series of flat-files, and it would be far more robust than 
db4o, and likely faster too. But the first thing to do is upgrade the database 
and see whether it 1) breaks even worse or 2) fixes the problems. Past testing 
suggests #1, but past testing was based on later versions of 7.4, not on the 
latest 7.12.


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Disk I/O thread

2012-09-02 Thread xor
On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
> Sadly Freetalk/WoT do use rollback so has to
> commit EVERY TIME. 

I oppose to the "sadly". Transaction-based programming has proven to be a 
valid approach to solve many issues of traditional "manually undo everything 
upon error"-programming.
The approach of fred to not use it is just wrong from a computer-science 
perspective.

The fact that it does not perform so well is an implementation issue of the 
database, not of the client code which uses the database.

Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong 
algorithms, so we are not even at the point where we can tell whether their 
ACID-usage is a problem. There is too much noise generated from the highly 
inefficient algorithms which are being used by them, we cannot measure the 
performance impact of commit/rollback.



Re: [freenet-dev] Disk I/O thread

2012-09-02 Thread xor
On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
 Sadly Freetalk/WoT do use rollback so has to
 commit EVERY TIME. 

I oppose to the sadly. Transaction-based programming has proven to be a 
valid approach to solve many issues of traditional manually undo everything 
upon error-programming.
The approach of fred to not use it is just wrong from a computer-science 
perspective.

The fact that it does not perform so well is an implementation issue of the 
database, not of the client code which uses the database.

Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong 
algorithms, so we are not even at the point where we can tell whether their 
ACID-usage is a problem. There is too much noise generated from the highly 
inefficient algorithms which are being used by them, we cannot measure the 
performance impact of commit/rollback.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


[freenet-dev] Disk I/O thread

2012-08-30 Thread Matthew Toseland
Response to a long thread on FMS about how to reduce Freenet's disk I/O, what 
are realistic system requirements, when can we expect to see SSDs taking over, 
and will Freenet kill commodity disks as a matter of routine.



I'm just going to reply to everyone here.

First, hard disk writes are not the only limited resource: As has been pointed 
out, RAM is limited too. So we can't assume that node.db4o will fit in RAM 
(much less persistent-blob.tmp, which is intimately related to node.db4o). But 
we should make good use of this happy situation when it occurs.

Second, if there is plenty of RAM, the OS will cache the file. So reads aren't 
a problem - they just create more CPU usage. Writes are the big problem: How do 
we reduce the number of database writes? As far as I know, if the node is 
"idle", in the sense that the requests are failing, we do no database writes at 
all. However, there may be some maintenance.

One big question is, is a short burst of writes every so often preferable to 
writes every second? Possible benefits:
- It's closer to what the hard disks expect so hopefully will have less impact 
on hard disk lifespans.
- The seeks can be quite small, so it should be fast-ish.

Possible drawbacks are that since it is more intense it might have a bigger 
negative impact on the rest of the system (for a short time). Which might be 
bad for e.g. online gaming, although we will want a gamer mode or something 
eventually (ideally with platform specific autodetection helpers).

Considering the datastore alone, it is perfectly feasible, and safe, to 
aggregate writes in memory, provided there is sufficient memory. (Based on the 
overall heap limit, which in turn is based on the detected amount of memory). 
One complication is if the data doesn't hit the main datastore it should still 
be in the ULPR/slashdot cache, so we'd need to allow that to access the 
in-memory blocks where appropriate.

Now, regarding the database (node.db4o):
- It is hard to make uploads not use lots of database queries without 
substantial changes. I may look into it but expect it to be difficult.
- Accepting limited potential data loss is not, at present, an option. The 
database is more likely to completely die than just lose some changes. This is 
why we fsync on commit, and commit frequently. Since we abuse the nominally 
ACID nature of the database (we never rollback), we can (and do) commit only 
when something important happens or periodically, but there is still a lot of 
traffic. Sadly Freetalk/WoT do use rollback so has to commit EVERY TIME.
- Periodic backups (synchronized with the persistent-blob file) could avoid the 
need for fsync. This would greatly reduce the actual disk writes by allowing 
the operating system to optimise them properly.
- In theory we could do more aggressive caching once we have this 
infrastructure, up to and including keeping the whole thing in RAM and writing 
it periodically. We would need to smoothly handle it growing so it doesn't fit.
- The actual blocks are just big linear writes, so it's much more efficient to 
buffer database writes than to buffer unwritten blocks. If we have a lot of RAM 
it may make sense to do both. Which would further complicate the above.
- Database jobs can be very slow especially if RAM is limited (meaning we have 
to do lots of reads because the OS isn't caching the whole file). Things like 
unpacking the next layer of a splitfile can be hideously slow. We can't 
necessarily aggregate commits, at least not at the job level. On the other 
hand, we DO aggregate commits at the job level to some degree, in the sense 
that while a big job such as above is running, the new blocks coming in are 
queued; eventually we stop fetching new blocks. IIRC mostly they are written to 
disk to save memory. :|

A lot of the above depends on an awful lot of RAM being available. Possibly we 
should tweak the autodetection. Certainly we will affect system performance by 
using too much RAM, just as we do with too many disk writes.

Unfortunately there are other places we write frequently such as the peers 
files too. These need debugging.

So, what of the above is not already on the bug tracker?

1. Do we want to aggregate writes to the datastore and write them periodically? 
(Implementation issues mentioned above)

2. Caching of blocks for persistent-blob.tmp, as well as of the database 
itself, if we have lots of RAM, after implementing auto-backups.

3. Can we give Freenet any more RAM? The current allocation (the wrapper memory 
limit, which does not include things like thread stacks) is:
<512MB -> 128M
<1GB -> 192M
<2G -> 256M
Else 512M
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: