Re: New integration branch

2011-12-05 Thread Joey Hess
Adam Spiers wrote:
  9c87f2352214175de307efedb8fd93889a26afbc
         Can you give an example of when this is needed?
 
 I can't remember but I definitely saw it happen at least once :-/

My worry is that, since that really shouldn't happen AFIACS, you
were actually seeing a bug. Either that or it's a corner case I have not
identified.

  602f26714254f3c65389b7665d15d1d5d0e227a9
         mr is quite typically (I know, not by you) run
         inside the repository. Which would leave the user
         in an apparently empty directory after mr update if
         an mr update deleted and remade the whole repository.
 
         I don't like that; I don't think things in mr should be
         deleting repositories in general; mr doesn't even delete
         a repo that has deleted = true, it only warns the user about it.
 
 Hmm, that's a fair point, although the only alternative is to change
 the contents of the directory rather than the directory itself -
 similarly to how `git checkout' does, for instance.  I'll see if I can
 get around to doing that.  Perhaps some of the effort could be reused
 for implementing download_diff (diff against the archive file).

I think you could just use rsync :)

  cf3388f443b9d7afe6dc7d8a2159b45fb01ab4e4
         This is a slow way to make machine-parsable info available --
         the similar mr list takes 8 seconds here, since it has to run
         169 shells. That's ok when you're just running mr, but I would
         not like to use a command that depended on that information.
 
 Sure, that's why I used a simple on-disk cache:
 
   
 https://github.com/aspiers/kitenet-mr/commit/b60acb2e767b91ca6d406198d7eea1b3f73ad2bf
 
 It works fine.  I could get more sophisticated and allow per-user
 configuration of the cache invalidation strategy, e.g. so that it
 would automatically rebuild the cache when ~/.mrconfig et al. are
 changed, but manual rebuilds aren't a great hardship.  In fact I could
 even rebuild the cache every time mr runs!
 
         If a machine-parseable list of repositories is needed,
         I think it'd be better to have a perl function that emits
         it in one go.
 
 I don't see how that's possible without ignoring the `skip',
 `deleted', and `include' parameters.

The include parameter is not a big problem, it's unlikely to require
more than one shell process, which will be relatively fast.

It's not clear to me what should be done about skip and deleted.
skip in particular can behave in weird ways, when something like
hours_since is used. To handle that all the skips would need to be
tested, which would be less work than mr list but still verging on
expensive. 

Depending on the application, it might be better to just dump all the
defined repositories including skipped and deleted ones; if the consumer
than runs mr in a skipped/deleted repo, mr will do something sane after
all.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: New integration branch

2011-12-05 Thread Adam Spiers
On Mon, Dec 5, 2011 at 5:38 PM, Joey Hess j...@kitenet.net wrote:
 Adam Spiers wrote:
  9c87f2352214175de307efedb8fd93889a26afbc
         Can you give an example of when this is needed?

 I can't remember but I definitely saw it happen at least once :-/

 My worry is that, since that really shouldn't happen AFIACS, you
 were actually seeing a bug. Either that or it's a corner case I have not
 identified.

It's on my TODO list to try to reproduce; I'll let you know if I
manage to.

  602f26714254f3c65389b7665d15d1d5d0e227a9
         mr is quite typically (I know, not by you) run
         inside the repository. Which would leave the user
         in an apparently empty directory after mr update if
         an mr update deleted and remade the whole repository.
 
         I don't like that; I don't think things in mr should be
         deleting repositories in general; mr doesn't even delete
         a repo that has deleted = true, it only warns the user about it.

 Hmm, that's a fair point, although the only alternative is to change
 the contents of the directory rather than the directory itself -
 similarly to how `git checkout' does, for instance.  I'll see if I can
 get around to doing that.  Perhaps some of the effort could be reused
 for implementing download_diff (diff against the archive file).

 I think you could just use rsync :)

Yeah, that sounds worth trying.

         If a machine-parseable list of repositories is needed,
         I think it'd be better to have a perl function that emits
         it in one go.

 I don't see how that's possible without ignoring the `skip',
 `deleted', and `include' parameters.

 The include parameter is not a big problem, it's unlikely to require
 more than one shell process, which will be relatively fast.

 It's not clear to me what should be done about skip and deleted.
 skip in particular can behave in weird ways, when something like
 hours_since is used. To handle that all the skips would need to be
 tested, which would be less work than mr list but still verging on
 expensive.

 Depending on the application, it might be better to just dump all the
 defined repositories including skipped and deleted ones; if the consumer
 than runs mr in a skipped/deleted repo, mr will do something sane after
 all.

Skipper functions like hours_since could (and probably should) decide
not to skip if MR_ACTION is set to a read-only action such list -
arguably diff and status too, although that's a matter of personal
taste.

But maybe we should step back a bit.  Currently we know that a full
mr list is not particularly fast, but has anyone actually profiled
it to find out where most of the time is being spent?  If we're only
guessing then we might have it completely wrong ...
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: New integration branch

2011-12-05 Thread Joey Hess
Joey Hess wrote:
 It could well not be. mr list -j 10 runs in the same time as mr list -j 1,
 suggesting the overhead is in something else than actually running the
 shell.

Whoops, bad benchmark, -j comes before action.

Anyway, yes, without any calls to system(), mr list takes just 0.35 seconds.
Those calls are:

169 mr list: running set -e;  # actual list command
118 mr skip: running vcs test  # 
 55 mr list: running skip test set -e;
 50 mr deleted: running vcs test 

(Note that the vcs test is split between skip and deleted, but
if those features are removed, the actual list command would
trigger the same test, so those don't add overhead.)

Moving the git_test etc into perl code would be one way to speed it up
for the common case. Adding a special case optimisation to avoid the shell
for true and false brings mr list down from 8.50 to 1.81 seconds.
The remaining time is here spent running skip tests, I have a lot. Probably
looking at sub-1-second times for most people.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: New integration branch

2011-12-05 Thread Joey Hess
Joey Hess wrote:
 Moving the git_test etc into perl code would be one way to speed it up
 for the common case. Adding a special case optimisation to avoid the shell
 for true and false brings mr list down from 8.50 to 1.81 seconds.
 The remaining time is here spent running skip tests, I have a lot. Probably
 looking at sub-1-second times for most people.

These optimisations are now in place.

joey@gnu:~/src/d-itime mr -q list 
1.14user 2.17system 0:05.12elapsed 64%CPU (0avgtext+0avgdata 26368maxresident)k 
0inputs+0outputs (0major+269034minor)pagefaults 0swaps
joey@gnu:~/src/d-itime ~/src/mr/mr -q list 
0.38user 0.02system 0:00.44elapsed 91%CPU (0avgtext+0avgdata 26640maxresident)k 
0inputs+0outputs (0major+6429minor)pagefaults 0swaps

joey@gnu:~time mr -q list 
1.67user 3.86system 0:08.75elapsed 63%CPU (0avgtext+0avgdata 26720maxresident)k 
0inputs+0outputs (0major+464487minor)pagefaults 0swaps
joey@gnu:~time ~/src/mr/mr -q list
0.56user 0.60system 0:01.78elapsed 65%CPU (0avgtext+0avgdata 26800maxresident)k 
0inputs+0outputs (0major+84959minor)pagefaults 0swaps

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: New integration branch

2011-12-04 Thread Adam Spiers
On Sun, Dec 4, 2011 at 9:11 PM, Joey Hess j...@kitenet.net wrote:
 37617a63ec993b128f77a945a2020ec894c58eb1
        loadconfig already uses %loaded to avoid reloading the same
        config twice, so this extra check is not necessary, I think.

Ah yes, I missed that.  Still, for the cost of an extra line of code,
isn't it worth being explicit?  If it confused me, then presumably it
could confuse other developers in the future.

 a61c1450ff4b108af26e899a89a1d8ff49cab86c
        I picked the bugfix part.

        The warning message on missing chain files exposes an unclear
        thing in mr; it will try to chain to directories even when their
        repository has skip = true, which causes the warning to show
        up unexpectedly (ie, here). I think it needs to be changed to
        honor skip = true even if chain = true.

Ah OK, I'll look into that.

 b3b68137988e61be1a0f7d90caf05eabf7850f44
        I developed a different fix this morning that shows correct
        line numbers for both the mrconfig and the position in the
        include, it's in my tree.

Yep, I saw that - very nice :)

 135e0076c9a93cd0556b9b25ff355ad25546a78c
        This makes mr fetch do a git fetch, but nothing for
        the over DVCSes which can also do things like fetch, and
        no documentation of it

I can add documentation.  However, although ISTR that `hg pull' is
akin to `git fetch', I don't know how it's done for the other
decentralised VCS.

 9c87f2352214175de307efedb8fd93889a26afbc
        Can you give an example of when this is needed?

I can't remember but I definitely saw it happen at least once :-/

 602f26714254f3c65389b7665d15d1d5d0e227a9
        mr is quite typically (I know, not by you) run
        inside the repository. Which would leave the user
        in an apparently empty directory after mr update if
        an mr update deleted and remade the whole repository.

        I don't like that; I don't think things in mr should be
        deleting repositories in general; mr doesn't even delete
        a repo that has deleted = true, it only warns the user about it.

Hmm, that's a fair point, although the only alternative is to change
the contents of the directory rather than the directory itself -
similarly to how `git checkout' does, for instance.  I'll see if I can
get around to doing that.  Perhaps some of the effort could be reused
for implementing download_diff (diff against the archive file).

 650620d7b6661f9cc59b4adfb6a7d945240fe5c7
 f16e51cea8595afc92f3ab9230e3c5a44baed904
        I've held off on these plugins since I think they
        depend on 602f26714254f3c65389b7665d15d1d5d0e227a9

No, only the download plugin.  The stow plugin *never* writes to the
repository tree, it manages symlinks in an entirely different place
(typically $HOME).  Having said that, I just remembered that the stow
plugin depends on MR_NAME so that will have to wait too.

 cf3388f443b9d7afe6dc7d8a2159b45fb01ab4e4
        This is a slow way to make machine-parsable info available --
        the similar mr list takes 8 seconds here, since it has to run
        169 shells. That's ok when you're just running mr, but I would
        not like to use a command that depended on that information.

Sure, that's why I used a simple on-disk cache:

  
https://github.com/aspiers/kitenet-mr/commit/b60acb2e767b91ca6d406198d7eea1b3f73ad2bf

It works fine.  I could get more sophisticated and allow per-user
configuration of the cache invalidation strategy, e.g. so that it
would automatically rebuild the cache when ~/.mrconfig et al. are
changed, but manual rebuilds aren't a great hardship.  In fact I could
even rebuild the cache every time mr runs!

        If a machine-parseable list of repositories is needed,
        I think it'd be better to have a perl function that emits
        it in one go.

I don't see how that's possible without ignoring the `skip',
`deleted', and `include' parameters.

        (Also, the patch references a MR_NAME that is not present in my
        mr tree.)

Yeah :-(  I had to rebase about 20 times to separate out the patches
you are currently willing to consider, so mistakes were inevitable.

 4cd2b59d0c66d71316dfc1d411a3e3da439643bc
        I'm not quite sure of the point of this refactoring,

Legibility and modularity.  Longer functions makes unfamiliar code
harder to understand, and often give variables an unnecessarily wide
scope.  If I see this line of code within bootstrap():

my $tmpconfig = download($url);

then

  (a) it's immediately obvious that $url contains config which is
  being downloaded and stored in a temporary file, so there's no
  need for the existing comment Download the config file to a
  temporary location,

  (b) it makes it easy _and_optional_ for me to view how the download
  is implemented, and

  (c) it means that the scope of @curlargs and $curlstatus are clearly
  limited to the download, so I don't need to visually grep for
  them in