Re: One Big Repo

2009-07-10 Thread Joey Hess
chombee wrote:
> On Fri, Feb 27, 2009 at 01:55:16PM -0500, Joey Hess wrote:
> > So, some of the specific problems include:
> > 
> > 
> > * Wanting to check large data files into a repo, but not having space
> >   to put that repo on some machines.
> 
> I think a good idea might be to have a special repo for big files 
> only. So you would have two general catch-all repos, one for really 
> big files and one just for small files. Right now I put every file 
> that doesn't belong somewhere else into one catch-all repo, whether 
> the file is big or small. But there's no reason why I shouldn't be 
> able to check out some text files and documents because I committed a 
> big bunch of PNG images.

I set this up myself recently. I have a git repo that I commit things
like every photo I suck off my camera, scans, and videos to. I think
of it as my raw data repository.

The bare repo is on my file server; my laptop clones it as follows:

git clone --shared /media/server/path/raw.git

This way the laptop does not have the overhead of the full .git repository,
it can just access that from the file server (nfs or sshfs). But
I can still commit locally and push to the server later.

If I commit a lot of big stuff and my local .git repo gets too big, this
dangerous command tries to ensure it's all been pushed to the file
server, and then cleans it out locally:

zap () {
if [ -e .git/objects/info/alternates ]; then
git push
rm -vf .git/objects/??/*
else
echo "not a --shared repo!"
fi
}

Only remaining problem is that checking really enormous files, such
as videos I am working on, into git makes git allocate memory for the whole
file. Needing to set up swap just to git commit a 700 mb dv file on my netbook
is a trifle annoying. :-P

I also use branches a lot in this repo, so that my netbook only keeps the
currently used files checked out. I figure that when this repo gets too big,
I'll just archive it off elsewhere, and start a new one.

> > * Having automated commits to some files (of achived mail, for example),
> >   and not wanting to see that in your general history, or deal with
> >   the merging/up-to-dateness issues it can entail.
> 
> Has anyone got this working (automated commit of archived mail)? 
> Currently I use offlineimap run by cron to sync my mail to a local 
> directory, then another cron job uses rsync to backup this directory, 
> just in case something goes wrong with the live copy. It'd be cool to 
> backup the mail directory by committing to a git repo.

Sure, I use the attached trimail script, which in turn uses archivemail
to move the read mail from the offlineimap maildirs into archival mailboxes,
and is run from cron nightly.

-- 
see shy jo
#!/bin/sh -e
# Archive old mail.

cd ~/mail/archive

# Move read mail that is older than 1 day old out of inbox folders and into
# archive.
for folder in `find ~/Maildir ~/Maildir/.* -maxdepth 0 -type d -not -name .. 
-not -name .Drafts`; do
dest=$(basename $folder | sed 's/^\.//')
if [ "$dest" = "" ] || [ "$dest" = "Maildir" ]; then
dest=inbox
fi
date=$(date +%Y-%m)
install -d $dest
if [ "$dest" = spam ] || [ "$dest" = virii ]; then
# Keep for 7 days, then delete.
archivemail -d7 --delete $folder
elif [ "$dest" = postmaster ]; then
# Keep for 1 day, then delete. While I'm getting flooded
# anyhow.
archivemail -d1 --delete $folder
else
archivemail -u -d2 -o $dest \
--archive-name=$date $folder
fi
done

for dir in `find -maxdepth 1 -mindepth 1 -type d -not -name .git`; do
# Compress mail not compressed by archivemail.
find $dir -maxdepth 1 -type f -regex '.*/[0-9]*-[0-9]*$' -exec gzip -9 
{} \;

# Either check old archives in, or delete them after a month.
if [ -n "$(git log -n 1 -- "$dir")" ]; then
git add `find $dir -maxdepth 1 -type f -regex 
'.*/[0-9]*-[0-9]*.gz'` 2>/dev/null || true
else
find $dir -maxdepth 1 -type f -mtime +31 -exec rm -f {} \;
fi
done
if ! git commit -q -a -m "autocommit" 2>/dev/null ; then
echo "git commit failed" >&2
exit 1
elif ! git push 2>/dev/null ; then
echo "git push failed" >&2
exit 1
fi
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-07-10 Thread chombee
On Fri, Feb 27, 2009 at 01:55:16PM -0500, Joey Hess wrote:
> So, some of the specific problems include:
> 
> 
> * Wanting to check large data files into a repo, but not having space
>   to put that repo on some machines.

I think a good idea might be to have a special repo for big files 
only. So you would have two general catch-all repos, one for really 
big files and one just for small files. Right now I put every file 
that doesn't belong somewhere else into one catch-all repo, whether 
the file is big or small. But there's no reason why I shouldn't be 
able to check out some text files and documents because I committed a 
big bunch of PNG images.

> * Having automated commits to some files (of achived mail, for example),
>   and not wanting to see that in your general history, or deal with
>   the merging/up-to-dateness issues it can entail.

Has anyone got this working (automated commit of archived mail)? 
Currently I use offlineimap run by cron to sync my mail to a local 
directory, then another cron job uses rsync to backup this directory, 
just in case something goes wrong with the live copy. It'd be cool to 
backup the mail directory by committing to a git repo.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: One Big Repo

2009-03-02 Thread Martin Fick

--- On Mon, 3/2/09, tchomby  wrote:

> I think you have the right balance. One single repo for
> everything might be taking it too far, a small, finite 
> number of repos that doesn't change very often at all 
> seems the right balance.

Yes.
 
> My mistake was that I did begin creating new repositories
> whenever I started work on anything that could be called a new
> 'project', and then things started to become a problem.

That can get out of hand I am sure, but as you hinted at above, it does not 
imply that multiple repos are a waste.

I like the idea of splitting repos up by types of things that I like to manage. 
 If you only are managing one thing, say office documents, then one repo seems 
appropriate.  But, if you manange your music, your videos, your mail, your 
source code, your photos, your office documents and your system software, it 
would seem very bad to make it one repo.  As mentioned by others: you may have 
to manage different user groups per repo.  You may have different backup 
policies per repo.  You may want to destroy the history to certain projects 
eventually.  You may want certain things hosted by repos on different machines 
depending upon connectivity.  All of these things might be handled (depending 
on your VCS) easier with multiple repos.

Some people may scoff at the idea of version controlling photos, music, videos. 
 I believe everything should be version controlled and backups should only be 
for your repos!  The reason most people scoff at version controlling certain 
things are probably exactly because of the limits imposed by having one repo.  
The minute you say to yourself, I will not version control this (project A), 
because:  it might take up too much space in my repo, it is only temporary, it 
might have sensitive info in it...  whatever the excuse it is that prevents you 
from version controlling something, it is a bad excuse!!!  As soon as you make 
it easy to make multiple repos, repos on a whim, you will find fewer excuses to 
not put things under version control.  

I even make a repo called volatile, for stuff that I really don't care about 
the history to.  If it gets too big, I can easily trim, or simply delete the 
repo and create a new one, without being concerned with what I will loose.  But 
it gives me piece of mind to backup up my "volatile" stuff into it.  Disk space 
is cheaper and cheaper, your time thinking/worrying about things more and more 
expensive.  At least make use of a new repo anytime you may not version control 
something simply because you don't want it in the repo with other stuff.

-Martin



  
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: One Big Repo

2009-03-02 Thread tchomby
Hi Joey,

I think you have the right balance. One single repo for everything might be 
taking it too far, a small, finite number of repos that doesn't change very 
often at all seems the right balance.

My mistake was that I did begin creating new repositories whenever I started 
work on anything that could be called a new 'project', and then things started 
to become a problem.

On Fri, Feb 27, 2009 at 01:55:16PM -0500, Joey Hess wrote:
> tchomby wrote:
> > *   You are less likely to lose files. With many small repos, it becomes 
> > almost 
> > as easy to lose an entire repo as it was to lose a file before you started 
> > versioning your homedir.
> 
> I have worried about this too. If you're making new small repos on a
> daily basis, then it would be easy to forget to push one out of your
> laptop, and lose it in one of the disasters laptops seem to make so
> common.
> 
> Also, old repos that are no longer used, and that you even stop
> checking out, become one server failure and backup oops away from being
> lost forever.
> 
> > *   With one big repo git log gives you a global history of all your files, 
> > a 
> > sort of log of what you've been doing on a day-to-day basis. This can be 
> > really 
> > handy. For example I have to meet with my supervisors every few weeks. 
> > Instead 
> > of using my memory I can just use git log to help me construct a progress 
> > report.
> 
> Yeah, I sometimes wish I could make mr construct an interleaved log of
> all the repos it runs on.
> 
> > All in all I don't understand why many small repos is the recommended 
> > approach, 
> > sounds like making something simple into something complex. What 
> > disadvantages 
> > does one big repo have?
> 
> I think that most of the disadvantages of using one big repo can be
> ignored until you have to share (part of) that repo with others.
> Note that wanting to check things out onto multiple machines
> eventually will tend toward the same set of problems that sharing
> the repo with others will present.
> 
> So, some of the specific problems include:
> 
> * Participating in typical free software development, which really
>   demands one repo per project. Or working for an employer, who probably
>   doesn't want their files in your personal repo.
> * Needing to keep some set of files private (not letting others see
>   them), and some other set *very* private (only on one or two machines).
> * Wanting to check large data files into a repo, but not having space
>   to put that repo on some machines.
> * Having automated commits to some files (of achived mail, for example),
>   and not wanting to see that in your general history, or deal with
>   the merging/up-to-dateness issues it can entail.
> * Wanting to host some files on one server (perhaps one that is
>   well-connected to the world), and others on another (perhaps one
>   at home, or at work).
> 
> I use a mixed approach:
> 
> * I have separate repos for files of well-defined types, like mail,
>   sound files, personal docs, personal programs, and my web site.
>   Basically, one for each top-level directory of my home directory.
> * I have separate repos for each free software (or work) project I am
>   involved with, and if I start a new project, I start a new repo for it.
>   For me, this means only a few new repos each year, hopefully.
> * I have a (over?)complicated set of several repos for my dotfiles, so
>   that I can have one repo with a minimal set that doesn't take much
>   space, another that adds in the larger stuff, and another that adds
>   private dotfiles.
> 
> Occasionally, something will start out in one place and have to move to
> another (ie, mr started out in my personal programs and moved to a
> standalone package). But most of the time, there's one obvious place to
> put any given file, with an existing repo that replicates it in a way
> that's appropriate for that type of file.
> 
> -- 
> see shy jo



> ___
> vcs-home mailing list
> vcs-home@lists.madduck.net
> http://lists.madduck.net/listinfo/vcs-home
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: One Big Repo

2009-03-02 Thread tchomby
On Mon, Mar 02, 2009 at 10:17:52AM +0100, Aristotle Pagaltzis wrote:
> 
> The point is that in git, if you change a single file somewhere
> in a subdirectory, then all of the tree objects from the root
> down up to the one containing that file object will change.
> (Unchanged subtrees, in contrast, will be shared by commits.)
> 
> Now, when you ask for the history of a file, git has to work
> through *all* of the commits and narrow down the list to those
> commits in which that file changed. That’s what they mean when
> they say that git tracks entire trees only, not files.
> 
> If you have a huge repository, and you only make single unrelated
> changes, then looking at the history of a single file will be
> rather inefficient, because there will be lots of commits to
> examine in which that file remains unchanged.
> 
> That is why it’s considered a good idea to break things down
> along units of stuff that gets changed together rather than
> throwing tons of unrelated crap into a single repository.

Hmm. Okay, understood, but I wonder just how big a repository would have to get 
before this started to be a problem? Git seems pretty fast. And like I said 
before if your repo really did get that big you could always archive it and 
start again. So I'm tempted to ignore this potential problem.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-03-02 Thread Aristotle Pagaltzis
* Ben Finney  [2009-02-27 12:15]:
> tchomby  writes:
> > * You only have to create the repo once. Creating new repos
> > is a PITA. After a simple git init; git add .; git commit;
> > you have to make a bare clone of the repo, scp that to your
> > central server, then update the original repo to track the
> > central clone, _and_ clone the repo onto your other machines,
> > add it to your mrconfig file... It's complicated enough that
> > things are likely to go wrong.
>
> Wow, that does sound like a PITA. I'm glad I'm using Bazaar;
> replication is simply a matter of ‘bzr push’ or ‘bzr pull’,
> depending on the direction.

He’s talking about the steps involved in setting up repos as part
of a system that manages many repos.

Pushing and pulling individual repos is just as easy in git as
you describe it in bzr.

Regards,
-- 
Aristotle Pagaltzis // 
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-03-02 Thread Aristotle Pagaltzis
* tchomby  [2009-02-27 11:35]:
> It's generally said that when using git it's best to break
> things up into many small repos, e.g. one per project, module,
> etc., rather than dumping it all into one big repo, and the
> same advice has been given about using git for your homedir. I
> can see why this is a good idea for source code, but for
> something like versioning your homedir I don't see it.

The point is that in git, if you change a single file somewhere
in a subdirectory, then all of the tree objects from the root
down up to the one containing that file object will change.
(Unchanged subtrees, in contrast, will be shared by commits.)

Now, when you ask for the history of a file, git has to work
through *all* of the commits and narrow down the list to those
commits in which that file changed. That’s what they mean when
they say that git tracks entire trees only, not files.

If you have a huge repository, and you only make single unrelated
changes, then looking at the history of a single file will be
rather inefficient, because there will be lots of commits to
examine in which that file remains unchanged.

That is why it’s considered a good idea to break things down
along units of stuff that gets changed together rather than
throwing tons of unrelated crap into a single repository.

I’m not sure yet where I stand on the one repo vs many repos
question when it comes to homedirs.

Regards,
-- 
Aristotle Pagaltzis // 
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-02-27 Thread Ben Finney
tchomby  writes:

> On Fri, Feb 27, 2009 at 10:11:59PM +1100, Ben Finney wrote:
> > 
> > > * With many repos you have to somehow keep track of them all
> > 
> > I don't know what “keep track of them all” would mean beyond
> > simply navigating the filesystem.
> 
> If you've been working feverishly, maybe you've had to much coffee,
> it comes to the end of the day and you have to commit what you've
> changed.

This implies that you commit at most once per day.

If one has been switching around doing lots of unrelated things, why
on earth have you not been committing between each one? If not, why
encourage that poor organisation?

-- 
 \  “An expert is a man who has made all the mistakes which can be |
  `\ made in a very narrow field.” —Niels Bohr |
_o__)  |
Ben Finney

___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-02-27 Thread Joey Hess
tchomby wrote:
> *   You are less likely to lose files. With many small repos, it becomes 
> almost 
> as easy to lose an entire repo as it was to lose a file before you started 
> versioning your homedir.

I have worried about this too. If you're making new small repos on a
daily basis, then it would be easy to forget to push one out of your
laptop, and lose it in one of the disasters laptops seem to make so
common.

Also, old repos that are no longer used, and that you even stop
checking out, become one server failure and backup oops away from being
lost forever.

> *   With one big repo git log gives you a global history of all your files, a 
> sort of log of what you've been doing on a day-to-day basis. This can be 
> really 
> handy. For example I have to meet with my supervisors every few weeks. 
> Instead 
> of using my memory I can just use git log to help me construct a progress 
> report.

Yeah, I sometimes wish I could make mr construct an interleaved log of
all the repos it runs on.

> All in all I don't understand why many small repos is the recommended 
> approach, 
> sounds like making something simple into something complex. What 
> disadvantages 
> does one big repo have?

I think that most of the disadvantages of using one big repo can be
ignored until you have to share (part of) that repo with others.
Note that wanting to check things out onto multiple machines
eventually will tend toward the same set of problems that sharing
the repo with others will present.

So, some of the specific problems include:

* Participating in typical free software development, which really
  demands one repo per project. Or working for an employer, who probably
  doesn't want their files in your personal repo.
* Needing to keep some set of files private (not letting others see
  them), and some other set *very* private (only on one or two machines).
* Wanting to check large data files into a repo, but not having space
  to put that repo on some machines.
* Having automated commits to some files (of achived mail, for example),
  and not wanting to see that in your general history, or deal with
  the merging/up-to-dateness issues it can entail.
* Wanting to host some files on one server (perhaps one that is
  well-connected to the world), and others on another (perhaps one
  at home, or at work).

I use a mixed approach:

* I have separate repos for files of well-defined types, like mail,
  sound files, personal docs, personal programs, and my web site.
  Basically, one for each top-level directory of my home directory.
* I have separate repos for each free software (or work) project I am
  involved with, and if I start a new project, I start a new repo for it.
  For me, this means only a few new repos each year, hopefully.
* I have a (over?)complicated set of several repos for my dotfiles, so
  that I can have one repo with a minimal set that doesn't take much
  space, another that adds in the larger stuff, and another that adds
  private dotfiles.

Occasionally, something will start out in one place and have to move to
another (ie, mr started out in my personal programs and moved to a
standalone package). But most of the time, there's one obvious place to
put any given file, with an existing repo that replicates it in a way
that's appropriate for that type of file.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-02-27 Thread David Bremner
tchomby wrote:

>On Fri, Feb 27, 2009 at 10:11:59PM +1100, Ben Finney wrote
>If you've been working feverishly, maybe you've had to much coffee, it comes 
>to 
>the end of the day and you have to commit what you've changed. With one git 
>repo a simple git status will show you what you need to do. With many repos, 
>you would have to do git status many times, probably you'd forget to commit 
>some changes. Same with committing, pushing and pulling and checking out new 
>repos. mr lets you do these commands on multiple repos at once but it adds the 
>trouble of managing mr and its mrconfig file.

I have done both. One big repo with SVN, and now mr+git.  I think one
big repo is fine, as long as the history is not something you care
about very much.  By now I am too dependent on having a reasonably
clean history in each project (for example to generate patches to send
to collegues) to go back to the one big repo approach.  For the kind
of work-log stuff you mention, I use org-mode in emacs.

I think the conclusions are roughly the same as when we discussed on
this list whether svn might actually be better at maintaining a home
directory: it depends whether you want to make snapshots, or to really
use version control in your work.  Both points of view are defensible;
unfortunately, once one's history is a mess, there may be no
reasonable way to disentangle it.

I use both approaches myself: some repos (like my .org files) I just
make snapshots, because I am pretty confident that I will never need
to "work-with" that history. Other projects, if when I do "mr commit"
there is something to commit, that probably means I abort, and go back
and look at the situation more carefully.

David

___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: One Big Repo

2009-02-27 Thread tchomby
On Fri, Feb 27, 2009 at 10:11:59PM +1100, Ben Finney wrote:
> 
> > * With many repos you have to somehow keep track of them all so you
> > need a tool like mr, one more tool to learn, and that means you need
> > to manage a mrconfig file.
> 
> I don't know what “keep track of them all” would mean beyond simply
> navigating the filesystem.

If you've been working feverishly, maybe you've had to much coffee, it comes to 
the end of the day and you have to commit what you've changed. With one git 
repo a simple git status will show you what you need to do. With many repos, 
you would have to do git status many times, probably you'd forget to commit 
some changes. Same with committing, pushing and pulling and checking out new 
repos. mr lets you do these commands on multiple repos at once but it adds the 
trouble of managing mr and its mrconfig file.
 
> > * With one big repo git log gives you a global history of all your
> > files, a sort of log of what you've been doing on a day-to-day
> > basis. This can be really handy. For example I have to meet with my
> > supervisors every few weeks. Instead of using my memory I can just
> > use git log to help me construct a progress report.
> 
> It also sounds like an awful mess. I want the log to show what I've
> been doing in the context of the specific working tree, without the
> dozens of other things going on in the same home directory.

You can easily do that with one repo though. If I want a log of what I changed 
in the my_new_project dir in my big git repo, then I do `git log 
my_new_project`. But if I want a log of everything I do just `git log`.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: One Big Repo

2009-02-27 Thread Ben Finney
tchomby  writes:

> Using one big git repo for your homedir has many advantages over
> using many small repos:
> 
> *   It's simpler.

Yes, though that simplicity can be limiting: treating your entire
directory as a single big working tree may not be the best fit.

> * You only have to create the repo once. Creating new repos is a
> PITA. After a simple git init; git add .; git commit; you have to
> make a bare clone of the repo, scp that to your central server, then
> update the original repo to track the central clone, _and_ clone the
> repo onto your other machines, add it to your mrconfig file... It's
> complicated enough that things are likely to go wrong.

Wow, that does sound like a PITA. I'm glad I'm using Bazaar;
replication is simply a matter of ‘bzr push’ or ‘bzr pull’, depending
on the direction.

> * You are less likely to lose files. With many small repos, it
> becomes almost as easy to lose an entire repo as it was to lose a
> file before you started versioning your homedir. It sort of defeats
> the point. With one big repo I just commit a new file to my repo and
> forget about it, then I know I'll never lose that file, the point is
> to avoid me having to think about it. With many repos I have to
> consider which repo a new file should belong to, even whether I
> should create a whole new repo for it.

If one aligns repositories with whatever working tree is being used
for the project, that seems to be a non-issue.

> * With many repos you have to somehow keep track of them all so you
> need a tool like mr, one more tool to learn, and that means you need
> to manage a mrconfig file.

I don't know what “keep track of them all” would mean beyond simply
navigating the filesystem.

> * With one big repo git log gives you a global history of all your
> files, a sort of log of what you've been doing on a day-to-day
> basis. This can be really handy. For example I have to meet with my
> supervisors every few weeks. Instead of using my memory I can just
> use git log to help me construct a progress report.

It also sounds like an awful mess. I want the log to show what I've
been doing in the context of the specific working tree, without the
dozens of other things going on in the same home directory.

-- 
 \  “When I wake up in the morning, I just can't get started until |
  `\ I've had that first, piping hot pot of coffee. Oh, I've tried |
_o__)other enemas...” —Emo Philips |
Ben Finney

___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home