Re: performance with 50GB files

2006-01-11 Thread René Rebe
Hi, On Wednesday 11 January 2006 00:31, Wayne Davison wrote: So, perhaps the size of the sending file should be factored into the calculation in order to set a minimum acceptable block size. This would be easy, because the generator already knows the size of both files at the time that it

Re: performance with 50GB files

2006-01-10 Thread René Rebe
Hi, in reply to my previous post, I can reproduce the issue locally here. I produced a 50146750688 bytes /home/test.dat out of cat'ing a lot of data files together (needed some input data ...). The initial rsync take over an hour saturating the 100mbit ethernet. I then used shred on the first GB

Re: performance with 50GB files

2006-01-10 Thread Wayne Davison
On Tue, Jan 10, 2006 at 07:46:19PM +0100, Ren? Rebe wrote: of course the dual cpu ppc64 receiver is idling waiting for any data to arrive. There is a known problem with really large numbers of blocks: the hash search algorithm gets too many collisions, and the search routine bogs down. This

Re: performance with 50GB files

2006-01-10 Thread Wayne Davison
On Tue, Jan 10, 2006 at 09:02:14PM +0100, Ren? Rebe wrote: So far just increasing the block-size significantly (10-20MB) bumps the speed by magnitudes into useful regions. That's good. For some reason I was thinking that the block size was nearly maxxed out for a 50GB file, but I can see that

Re: performance with 50GB files

2006-01-10 Thread René Rebe
Hi, On Tuesday 10 January 2006 21:47, Wayne Davison wrote: On Tue, Jan 10, 2006 at 09:02:14PM +0100, Ren? Rebe wrote: So far just increasing the block-size significantly (10-20MB) bumps the speed by magnitudes into useful regions. That's good. For some reason I was thinking that the

Re: performance with 50GB files

2006-01-10 Thread Wayne Davison
On Tue, Jan 10, 2006 at 11:31:08PM +0100, Ren? Rebe wrote: Also I found the current code does decide from the receiving-side file what blocksize to use. The idea here is that the only checksum data that get transmitted and stored in the hash table are those for the blocks in the file on the

performance with 50GB files

2006-01-09 Thread René Rebe
Hi all, today we had a performance issue transfering a big amount of data where one file was over 50GB. Rsync was tunneled over SSH and we expected the data to be synced within hours. However after over 10 hours the data is still not synced ... The sending box has rsync running with 60-80 % CPU