Re: git annex get performance issues with rsync

2012-01-18 Thread Joey Hess
Adam Spiers wrote:
> OK.  You mean this?
> 
> http://git-annex.branchable.com/todo/parallel_possibilities/

More like this:

http://git-annex.branchable.com/todo/wishlist:_Prevent_repeated_password_prompts_for_one_command

> > You can enable ssh's connection sharing though. (ControlMaster)
> 
> The figures above were already with ControlMaster enabled.
> It helps, but the rsync invocation per file still hurts a lot.

Are you actually measuring a significant time used in starting rsync?

I think it more likely that time is spent recording location logs to the
git-annex branch. You also mentioned you were using --copies, which
requires looking up the location log for each file, even ones that would
not otherwise be processed.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: git annex get performance issues with rsync

2012-01-18 Thread Adam Spiers
On Wed, Jan 18, 2012 at 4:09 PM, Joey Hess  wrote:
> Adam Spiers wrote:
>> One of my USB drives just died, so I'm doing a 'git annex get --not
>> --copies 1' to re-attain data redundancy.  It seems that a new rsync
>> instance is invoked for each file?  In my case, I have thousands of
>> photos which are big enough to be worth annexing but still not
>> individually huge, so it seems that the overhead of each rsync
>> invocation is significantly impacting throughput.  A quick empirical
>> test showed in 20 seconds, that 'git annex get' managed to transfer 11
>> photos, whereas a single (manual) rsync run transferred 33.  Is this
>> easily fixable?
>
> No, it's on the todo list but very far down it.

OK.  You mean this?

http://git-annex.branchable.com/todo/parallel_possibilities/

> You can enable ssh's connection sharing though. (ControlMaster)

The figures above were already with ControlMaster enabled.
It helps, but the rsync invocation per file still hurts a lot.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: git annex get performance issues with rsync

2012-01-18 Thread Joey Hess
Adam Spiers wrote:
> One of my USB drives just died, so I'm doing a 'git annex get --not
> --copies 1' to re-attain data redundancy.  It seems that a new rsync
> instance is invoked for each file?  In my case, I have thousands of
> photos which are big enough to be worth annexing but still not
> individually huge, so it seems that the overhead of each rsync
> invocation is significantly impacting throughput.  A quick empirical
> test showed in 20 seconds, that 'git annex get' managed to transfer 11
> photos, whereas a single (manual) rsync run transferred 33.  Is this
> easily fixable?

No, it's on the todo list but very far down it.

You can enable ssh's connection sharing though. (ControlMaster)

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

git annex get performance issues with rsync

2012-01-18 Thread Adam Spiers
One of my USB drives just died, so I'm doing a 'git annex get --not
--copies 1' to re-attain data redundancy.  It seems that a new rsync
instance is invoked for each file?  In my case, I have thousands of
photos which are big enough to be worth annexing but still not
individually huge, so it seems that the overhead of each rsync
invocation is significantly impacting throughput.  A quick empirical
test showed in 20 seconds, that 'git annex get' managed to transfer 11
photos, whereas a single (manual) rsync run transferred 33.  Is this
easily fixable?
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home