Re: [gentoo-user] File synchronisation utility (searching for/about to program it)
On Sat, 25 Jul 2009 13:10:41 -0400 Simon turne...@gmail.com wrote: I have tried using git in the past and found that it doesnt work in my 'space constrained' scenario. The need for a repository is a problem. The use of the usbkey however is nice since it allows git to work without having each computer maintain its own repository... but still... i dont currently have a usbkey that's large enough to hold all my data, even if i could compress it i doubt it would fit. Another thing is, i wonder if it retains the attributes of the file (creation date, mod date, owner/group, permissions)? As this can be important on some aspects of my synchronisation needs. Vanilla git doesn't, apart from executable bit. Due to highly-modular structure of git, one can easily implement it as a wrapper or replacement binary at some level, storing metadata in some form (plain list, mirror tree or just alongside each file) when pushing changes to repo, applying on each pull. Then there are also git-hooks, which should be a better way than wrapper in theory, but I found them much harder to use in practice. Still, git is a very good solution that works incrementally in a differential manner (makes patches from previous versions). But when i tried it, i found to suit my needs it would require the programming of a big wrapper that would interface git to make some daily quick actions simpler than a few git commands. That's another advantage of wrapper, but note that git-commands themselves can be quite extensible via aliases, configurable in gitconfig at any level (repo, home, system-wide). [alias] ci = commit -a co = checkout st = status -a br = branch ru = remote update ui = update-index --refresh cp = cherry-pick Still, things such are git ui git cp X are quite common, so wrapper, or at least a set of shell aliases is quite handy. I apologize if the existence of a bare repo as an intermediary is a problem. This can be done on a server as well. It is... it makes all my computer dependant on that repo... sync'ing computers at home can be done alright, but will still require walking around pluging/unpluging. Makes this practically impossible to do over the network (or to sync my host on the internet, not all my pc are connected to the internet so the repo cant be just on the server, i would have to maintain several repositories to work this out...). It may be possible to adapt it to my scenario, but i think it will require a lot of design in advance... but i'll check it out... at worst it will convince me i should program my own, better it will give me some good ideas or fortify some of my own good ideas and at best it will be the thing i've been looking for! Why keep bare repo at all? That's certainly not a prequisite with distributed VCS like git. You can fetch / merge / rebase / cherry-pick commits with git via ssh just as easy as with rsync, using some intermediate media only if machines aren't connected at all, but then there's just no way around it. And even here, knowing approximate date of last sync, you can use commands like git-bundle to create single pack of new objects, which remote(s) can easily import, transferring this via any applicable method or protocol between / to any number of hosts. As you've noted already, git is quite efficient when it comes to storage, keeping the changes alone. When this will become a problem due to long history of long-obsoleted changes, you can drop them all, effectively 'sqashing' all the commits in one of the repos, rebasing the rest against it. So that should cover requirement one. Cherry-picking commits or checking out individual files / dirs on top of any base from any other repo/revision is pretty much what is stated in the next three requirements. One gotcha here is that you should be used to making individual commits consistent and atomic, so each set of changes serves one purpose and you won't be in situation when you'll need half of commit anywhere. Conflict resolution is what you get with merge / rebase (just look at the fine git-merge man page), but due to abscence of ultimate AI these better used repeatedly against the same tree. About the last point of original post... I don't think git is intuitive until you understand exactly how it works - that's when it becomes one, with all the high-level and intermediate interfaces having great manpage and sole, clear purpose. That said, I don't think git is the best way to sync everything. I don't mix binary files with configuration, because just latter suffice with gentoo: you have git-synced portage (emerge can sync via VCS out-of-the-box), git-controlled overlay on top of it and pull the world/sets/flag/etc changes... just run emerge and you're set, without having to worry about architectural incompatibilities of binaries or missing misc libs, against which they're linked here and there. That's what portage is made for, after all. Just think of
Re: [gentoo-user] File synchronisation utility (searching for/about to program it)
I'm the last you would want to give advice about this question, but even though I am not a programmer, I have been using git to sync on three different systems. I am using a flash drive as a cache, so to speak. I followed some tips from the Emacs org-mode mailing list to get this going. It wasn't simple for me to recover when some files got out of sync on one of the machines, but it was simple enough that even I could figure it out. I used a bare repo on the flash drive and push from each machine to this, a very simple procedure that can be automated through cron, and pull to each machine also from the bare repository. I am not syncing a programming project, but my various work. Your reply is more than welcome! I have tried using git in the past and found that it doesnt work in my 'space constrained' scenario. The need for a repository is a problem. The use of the usbkey however is nice since it allows git to work without having each computer maintain its own repository... but still... i dont currently have a usbkey that's large enough to hold all my data, even if i could compress it i doubt it would fit. Another thing is, i wonder if it retains the attributes of the file (creation date, mod date, owner/group, permissions)? As this can be important on some aspects of my synchronisation needs. Still, git is a very good solution that works incrementally in a differential manner (makes patches from previous versions). But when i tried it, i found to suit my needs it would require the programming of a big wrapper that would interface git to make some daily quick actions simpler than a few git commands. Again, I am the least clueful you will find on this list, but if you wish for me to tell you the steps I followed, that is possible. One of the mailing list threads that got me up to speed relatively quickly was at this link. (Hope it's ok to link another mailing list from this one.) http://www.mail-archive.com/emacs-orgm...@gnu.org/msg11647.html I'll check it out... since i have my own solution all thought of and designed, i'll be able to compare it and re-evaluate git from a new angle. As far as i can tell, there is no rule against links, but i think there might be against publicity (ie if the link was to your business product that fullfills my need). I apologize if the existence of a bare repo as an intermediary is a problem. This can be done on a server as well. It is... it makes all my computer dependant on that repo... sync'ing computers at home can be done alright, but will still require walking around pluging/unpluging. Makes this practically impossible to do over the network (or to sync my host on the internet, not all my pc are connected to the internet so the repo cant be just on the server, i would have to maintain several repositories to work this out...). It may be possible to adapt it to my scenario, but i think it will require a lot of design in advance... but i'll check it out... at worst it will convince me i should program my own, better it will give me some good ideas or fortify some of my own good ideas and at best it will be the thing i've been looking for! Thanks again! Simon
Re: [gentoo-user] File synchronisation utility (searching for/about to program it)
Hello, Simon: I'm the last you would want to give advice about this question, but even though I am not a programmer, I have been using git to sync on three different systems. I am using a flash drive as a cache, so to speak. I followed some tips from the Emacs org-mode mailing list to get this going. It wasn't simple for me to recover when some files got out of sync on one of the machines, but it was simple enough that even I could figure it out. I used a bare repo on the flash drive and push from each machine to this, a very simple procedure that can be automated through cron, and pull to each machine also from the bare repository. I am not syncing a programming project, but my various work. Again, I am the least clueful you will find on this list, but if you wish for me to tell you the steps I followed, that is possible. One of the mailing list threads that got me up to speed relatively quickly was at this link. (Hope it's ok to link another mailing list from this one.) http://www.mail-archive.com/emacs-orgm...@gnu.org/msg11647.html I apologize if the existence of a bare repo as an intermediary is a problem. This can be done on a server as well. Alan Davis You can know the name of a bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird... So let's look at the bird and see what it's doing---that's what counts. Richard Feynman
[gentoo-user] File synchronisation utility (searching for/about to program it)
Hi there! I was about to jump into the programming of my own sync utility when i thought: Maybe i should ask if it exists first! Also, this is not really gentoo-related: it doesnt deal with OS or portage... but i'm rather asking the venerable community at large, excuse me if you find this post inappropriate (but can you suggest a more appropriate audience?). There are lots of sync utility out there, but my search hasnt found the one utility that has all the features i require. Most lack some of these features, some will have undesirable limitations... I'm currently using unison for all my sync needs, it's the best i found so far but it is very limited on some aspects and it's a bit painful on my setup. Make sure i clearly refuse to even consider network filesystems, and the reason is i need each computer to be fully independent from each other, i sync my important files so to have a working backup on all my pcs (my laptop breaks? fine, i just start my desktop and continue working transparently, well, with last sync'ed files). Any kind of NFS could be considered for doing the file transfers, but i dont think any of them can compete with rsync, so they're out of the question. Now, i know some of you will have the reflex to say: try Such tool, it support 4 out of your 5 requirements. Or try Such tool, it supports them all, but you'll have to bend things a bit to make it work like you want I'm looking for the perfect solution, and if it doesnt exist, well, i'm about to code it in C or C++, i have the design ready and the concept is very simple yet provides all my features. I wish to publish the result as open software (probably with a license like BSD or maybe LGPL, maybe but hopefully not GPL) and what i'm about to code will be compatible Linux and MacOSX for sure, a port to windows will require some dumb extensions (such as windows path to unix path conversion, and file transfer support) and it will use very little deps. My project intends to use rsync for the transfer, and so my project will basically extend rsync with all my required features. Rsync does the transfer, i can't compete with how good rsync is at transfering (works through ssh, rsh, through its daemon, does differential transfers, transfers attributes/ownership...), but my project will be better at finding what needs to be transfered, what needs to be deleted and this on as many computers you want and in one shot. Here are the features that i seek/require (that i will be programming if no utility can provide them all, the list is actually longer, but i can live without the items not written here): -Little space requirements: I could use rsync to make an incremental backup using hardlinks, and basically just copy whatever is new on each replica, but this takes way too much space and still doesnt deal with deletes properly (ie a file is on A and B, gets deleted on A and on B and recreated on B. In reality we have a new file on B, but rsync might want to delete this new file on B thinking it's the file that got deleted on A, unison works admirably here, it finds the first file effectively got deleted on both, nothing to do, and new file appeared on B which needs to be transfered to A... the space unison uses to cache its date is about 100mb now, and i havent cleaned it since i started using it, i believe more than half of it could be removed, even 100mb still represents about 1% of what is sync'ed). -Server-less: I dont want to maintain a server on even a single computer. I like unison since it executes the server through ssh only when used, it's never listening, it's never started at boot time. This is excellent behavior and simplifies maintenance. -Bidirectional pair-wise sync: Meaning i can start the sync from host A or from host B, the process should be the same, should take same amount of time, result should be the same. I should never have to care where the sync is initiated. (Unison doesnt support this, but it's ok to sync from both directions, it's just not optimised) -Star topology: Or any topologise that allow syncing multiple computers at once... I'm tired of doing several pairwise syncs since to do a full sync of my 3 computers (called A,B and C), i first have to sync A-B and A-C, at this point A contains all the diffs and is sync'ed, but i have to do it once more A-B and A-C to sync the others (ie so B gets C's modifs). -Anarchic mode: hehe however you call it, using the same 3 hosts, i'd like to be able to do a pairwise sync between: A-B, A-C and also B-C. To have the sync process decentralised... This is possible with unison but of course i have to ssh to the remote host i want to sync with another remote host. -Intelligent conflict resolution: Let's face it, the sync utility wasnt gifted with artificial intelligence, so why bother? It should depend on the user's intelligence, but it should depend on it intelligently. Meaning, it should remember (if users wants it) the resolution of