Re: fetch-all-cws.sh (was: Building a single Hg repository)

Michael Stahl Fri, 01 Jul 2011 13:59:06 -0700

On 01.07.2011 13:42, Greg Stein wrote:

On Wed, Jun 29, 2011 at 05:04, Michael Stahl<[email protected]>  wrote:

...
in principle the size of a CWS is on the same order as the master, because
it's just another HG repository.


but HG supports hardlinks between repositories (in newer versions even on
win32), so you can "hg clone" the master on the same filesystem and then
pull in the CWS, and it will be _much_ faster and take _much_ less
additional space


This is the approach that I took. Please look at
tools/dev/fetch-all-cws.sh. Each of these CWS repositories (on Mac OS)
are consuming 600 Mb *minimum*. I've fetched a dozen, and a couple are
over 2 Gb each, and another over 1 Gb. And this is with the clone/pull
technique.


indeed, i get similar numbers.
a clone with hardlinks is 34 MB on my filesystem.
a CWS with a single changeset takes 670MB.

reason is that for every commit, 2 files that store changelogs andmanifests are modified, and together these are >600 MB in our repo.

I don't have enough space on my laptop to do a complete trial run. I'm
hoping that somebody can figure out how to reduce the disk footprint,
or determine that we just have to suck it up. And it would be nice to
understand what that target size will be, for all 250 CWS
repositories.

i think i wrote that all CWSes as HG repos take ~100 GB, but actually inow think i remembered wrong and the number was more like ~150 GB.

(i did this originally in 2 steps, and i remembered only the second step...)

(and if it weren't so late now i'd even dig out my external hd and rundu...)


of course the filesystem used could make a difference here.

but actually i think that a lot of these 250 CWSes will not contain achangeset that is not in the master already; a lot of developers createnew CWS and then (have to) work on something else for some weeks...


so i have adapted the fetch script to skip empty CWSes.

A possible alternative to pulling N repositories, then combining them
in a second step, is to attempt to bring them all into a single
repository, one at a time. This is a little more scary for me, not
knowing Hg, to understand how restartable and repeatable this process
will be in the face of errors. Either starting from scratch, or (I
believe an important feature) if it needs to be resumed after some
minor failure (eg. network failure).

this would of course take much less space, but then it would benecessary to mark the newly pulled head immediately to know which CWS itcorresponds to.

We have a script. It is time to make it work.

Michael: you say that some CWS repositories are useless. If so, then
please update tools/dev/cws-list.txt to comment-out those CWS's with
some explanation for future readers. No need for us to attempt to
process them if they're bogus.

i have checked the status in EIS, and it seems like the repos for almostall integrated/deleted CWSes have already been automatically removedfrom the server.found a couple that were in a state "cancelled", which i didn't evenknow existed, sounds like we don't need those, so i've commented them out.

of course some CWSes contain stuff that's not useful, but i don't knowwhich these are :)


--
"Fools ignore complexity.  Pragmatists suffer it.  Some can avoid it.
 Geniuses remove it." -- Alan J. Perlis

Re: fetch-all-cws.sh (was: Building a single Hg repository)

Reply via email to