Re: git annex get performance issues with rsync
Adam Spiers wrote: > OK. You mean this? > > http://git-annex.branchable.com/todo/parallel_possibilities/ More like this: http://git-annex.branchable.com/todo/wishlist:_Prevent_repeated_password_prompts_for_one_command > > You can enable ssh's connection sharing though. (ControlMaster) > > The figures above were already with ControlMaster enabled. > It helps, but the rsync invocation per file still hurts a lot. Are you actually measuring a significant time used in starting rsync? I think it more likely that time is spent recording location logs to the git-annex branch. You also mentioned you were using --copies, which requires looking up the location log for each file, even ones that would not otherwise be processed. -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: git annex get performance issues with rsync
On Wed, Jan 18, 2012 at 4:09 PM, Joey Hess wrote: > Adam Spiers wrote: >> One of my USB drives just died, so I'm doing a 'git annex get --not >> --copies 1' to re-attain data redundancy. It seems that a new rsync >> instance is invoked for each file? In my case, I have thousands of >> photos which are big enough to be worth annexing but still not >> individually huge, so it seems that the overhead of each rsync >> invocation is significantly impacting throughput. A quick empirical >> test showed in 20 seconds, that 'git annex get' managed to transfer 11 >> photos, whereas a single (manual) rsync run transferred 33. Is this >> easily fixable? > > No, it's on the todo list but very far down it. OK. You mean this? http://git-annex.branchable.com/todo/parallel_possibilities/ > You can enable ssh's connection sharing though. (ControlMaster) The figures above were already with ControlMaster enabled. It helps, but the rsync invocation per file still hurts a lot. ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: git annex get performance issues with rsync
Adam Spiers wrote: > One of my USB drives just died, so I'm doing a 'git annex get --not > --copies 1' to re-attain data redundancy. It seems that a new rsync > instance is invoked for each file? In my case, I have thousands of > photos which are big enough to be worth annexing but still not > individually huge, so it seems that the overhead of each rsync > invocation is significantly impacting throughput. A quick empirical > test showed in 20 seconds, that 'git annex get' managed to transfer 11 > photos, whereas a single (manual) rsync run transferred 33. Is this > easily fixable? No, it's on the todo list but very far down it. You can enable ssh's connection sharing though. (ControlMaster) -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
git annex get performance issues with rsync
One of my USB drives just died, so I'm doing a 'git annex get --not --copies 1' to re-attain data redundancy. It seems that a new rsync instance is invoked for each file? In my case, I have thousands of photos which are big enough to be worth annexing but still not individually huge, so it seems that the overhead of each rsync invocation is significantly impacting throughput. A quick empirical test showed in 20 seconds, that 'git annex get' managed to transfer 11 photos, whereas a single (manual) rsync run transferred 33. Is this easily fixable? ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home