Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
Thank you for taking the time to do this. On Sun, Jun 12, 2011 at 11:52, Richard Hartmann wrote: > * no use strict in validate_metadata? How embarassing! Thank you for pointing that out. I have no idea how that happened. I *always* start writing my perl scripts with 'use strict; use warnings;' ... excuses excuses > * validate_metadata uses -e, not -f to check of .metafile exists Good point, fixed. > * .metafile seems to reside in $GIT_WORK_TREE/.git/hooks, not in > $GIT_WORK_TREE? I'm not sure where you're seeing this. build_metafile blindly opens a file relative to the directory 'git --rev-parse --show-toplevel' reports. post-commit calls 'build_metafile .metafile' which will create .metafile in the git repo toplevel directory. > * post-commit sets $OLDPWD, but does not use it. Why? It's a relic of the original project I scavanged from. This has been fixed. > * post-merge seems to be buggy and a no-op? post-merge is, or should be, a symlink to post-commit. > * post-commit will always clobber all local data. This seems to be unsafe. I'm not sure what you mean. .metafile will be overwritten with the current information but since its being committed to the repository the changes will always be accessible. Nothing else is affected. -- Harley J Pig ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, Apr 10, 2011 at 16:43, Harley J Pig wrote: > https://github.com/harleypig/gitperms I have had some time to actually have a quick look at your script, now. Disclaimer: I didn't run it, yet. I have a few questions/comments: * no use strict in validate_metadata? * validate_metadata uses -e, not -f to check of .metafile exists * .metafile seems to reside in $GIT_WORK_TREE/.git/hooks, not in $GIT_WORK_TREE? * post-commit sets $OLDPWD, but does not use it. Why? * post-merge seems to be buggy and a no-op? * post-commit will always clobber all local data. This seems to be unsafe. As I said, I only had time for a quick glance, not a full test; sorry. Richard ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, Apr 10, 2011 at 18:12, Richard Hartmann wrote: > Done. Thanks. -- Harley J Pig ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Mon, Apr 11, 2011 at 02:07, Harley J Pig wrote: > I'm not subscribed to that list, go ahead and post it if you would. Thank > you. Done. Richard ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, Apr 10, 2011 at 09:48, Richard Hartmann wrote: > Are you willing to bounce that onto the git list or should I do so? I'm not subscribed to that list, go ahead and post it if you would. Thank you. -- Harley J Pig ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, Apr 10, 2011 at 16:43, Harley J Pig wrote: > You can > find it at https://github.com/harleypig/gitperms Are you willing to bounce that onto the git list or should I do so? RIchard ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
I've written a metastore clone for a project where we need to store a linux distribution in version control (legacy code). I'm also using it for my personal vcs-home stuff. It is a naive and bluntly straightforward way to do this, but it seems to be working. You can find it at https://github.com/harleypig/gitperms I use git hooks and a central file to (re)store the metadata. Maybe it can be of some use to someone else. -- Harley J Pig ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sat, Apr 9, 2011 at 04:42, Christophe-Marie Duquesne wrote: > git-annex does location tracking. Even if you delete the link, the file is > still there and other repositories know what repositories have the file. If > you want to be sure the file is always reachable, you have to force a > repository to act central and to download every files. That is a mount > option I have already added ( -o getall). FYI, git-annex gained the ability to use a bup remote. This will solve all problems in this regard if used correctly and will even give you indefinite and full history. As an aside, please look here [1] for a current discussion on how to store metadata in git, enabling git-annex to do so, enabling any FUSE front-ends to act more in line with normal file systems. Smudge filters were mentioned so this must be good ;) Richard [1] http://marc.info/?l=git&m=130220380412726&w=4 ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
I'll try to gather things I can answer to: I see you include fuse.py - http://code.google.com/p/fusepy/ - in your repo. > how does it compare to fuse-python - > http://pypi.python.org/pypi/fuse-python ? > fusepy is written with ctypes while fuse-python is a full-blown C extension. At first, I was using fuse-python, but I ended thinking fusepy was less bloated and less painful (just a file to include versus a library to compile and install). where will you store this backup copy? introducing a node/repository which > will hold backup copies can be considered going to a centralized model; > which is something you (Christophe-Marie) try to explicitly avoid, but I > think this is not necessarily a problem) > git-annex does location tracking. Even if you delete the link, the file is still there and other repositories know what repositories have the file. If you want to be sure the file is always reachable, you have to force a repository to act central and to download every files. That is a mount option I have already added ( -o getall). This is also an area I hope to improve in git-annex, by using git smudge > filters. So it might get a mode where files can be modified and git > commit just annexes the new content. That would be great. I am not sure using fuse would still be necessary, then. ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
Hi I see there have been some good thoughts given about this. I am currently on vacation in a place where I do not have internet access. I'll come back to you in a week. Regards, Christophe-Marie ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, 3 Apr 2011 11:18:05 -0400 Joey Hess wrote: > Dieter Plaetinck wrote: > > > I think having support for this in git-annex would be very useful, > > even if it's not that efficient: if this can be dealt with in > > git-annex, individual "higherlevel" projects like sharebox and > > dvcs-autosync have less headaches. Not to mention > > sharebox/dvcs-autosync would need to do really inefficient things to > > deal with it anyway. (because they can't involve themselves into the > > actual git/dvcs tricks, they work on a higher level of abstraction), > > and it might also benefit some users who work with git-annex manually. > > How do you see this? How hard/cumbersome is it to implement this in > > git-annex? Why is it inefficient? It's not really clear to me after > > reading the smudge information on > > http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html > > http://git-annex.branchable.com/todo/smudge/ > > > > if toobig > > > then git_annex_add file > > > else git_add file > > > git_commit file > > > > unfortunately I don't think so: > > - with dvcs-autosync we often commit "early", as in, the file could still > > be in the process of being written to, or it could be modified again after > > we added it. > > From what I understand, we would need to forbid our users from changing the > > file after it is added to git-annex, and worse: if git-annex does its "move > > file, replace file with symlink" trick, while the user is writing to it, > > this might break things. > > You're right. However, you would also not want to commit many partial > versions of a large file as it was being written. Well, if it ever happens once, that's once too many. Since we're aiming for a dropbox-like near-instant-synchronisation system, the way of working is different then when using git for , say.. version controlling source code. So it _will_ happen that we commit versions of files as they are in the progress of being written. Even if the user decides to store something like a continuously being updated logfile in his dropbox-like system, I want to be able to support that. ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
Richard Hartmann wrote: > I know Joey pondered this as well, you will find some references on > git-annex' ikiwiki. This is needed for S3 in the medium term, anyway. > > Basically, the plan is to encrypt the files with a symmetric key and > then allow access to that key via other keys. That way, you can share > some files between machines/people and still make sure no one gets at > stuff they shouldn't. > > The way to encrypt object files' names is still somewhat open to > discussion, afaik. > > > Classical dilemma: Where should this be discussed? On this list or > within the ikiwiki? Maybe everyone interested should read through the > ikiwiki and after some discussion here, we can dump use cases, design > decisions etc back into ikiwiki as a TODO once Joey is happy with it? I've put together my current thoughts at http://git-annex.branchable.com/design/encryption/ Comments appreciated in any medium (except watercolors). -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
Dieter Plaetinck wrote: > I think having support for this in git-annex would be very useful, > even if it's not that efficient: if this can be dealt with in > git-annex, individual "higherlevel" projects like sharebox and > dvcs-autosync have less headaches. Not to mention > sharebox/dvcs-autosync would need to do really inefficient things to > deal with it anyway. (because they can't involve themselves into the > actual git/dvcs tricks, they work on a higher level of abstraction), > and it might also benefit some users who work with git-annex manually. > How do you see this? How hard/cumbersome is it to implement this in > git-annex? Why is it inefficient? It's not really clear to me after > reading the smudge information on > http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html http://git-annex.branchable.com/todo/smudge/ > > if toobig > > then git_annex_add file > > else git_add file > > git_commit file > > unfortunately I don't think so: > - with dvcs-autosync we often commit "early", as in, the file could still be > in the process of being written to, or it could be modified again after we > added it. > From what I understand, we would need to forbid our users from changing the > file after it is added to git-annex, and worse: if git-annex does its "move > file, replace file with symlink" trick, while the user is writing to it, this > might break things. You're right. However, you would also not want to commit many partial versions of a large file as it was being written. > - when a remote A pulls in the changes from remote B, for dropbox-like > behavior it should also automatically: > * run `git annex get` > * git commit .git-annex/*/*.log > Does this seem about right? Yes. > - deletes will also need to propagate automatically (see next paragraph), > still need to figure out how to do that best. > Note that dropbox-like behavior is different from the behavior you usually > expect from git-annex users. > * usual git-annex behavior: every remote stands on it's own, there is no > forced "being in sync", so that deletes must happen as initiated by the user, > and this way you can prevent them from removing files if you expect it could > be the last instance of the file. > * dropbox-like : remote A remove a file -> *all other remotes* should remove > the file, so that their "working copy" looks the same. BUT the file should > still be available *somewhere* so that a restore can be initiated (preferably > from any of these nodes) > > I see two solutions here: > - centralized: have 1 (or more) remotes that always keep a copy of the files > which are being removed on all other remotes, these would be backup-nodes, > they don't follow the strict "always in sync" rule that applies to the > regular nodes. (they follow the original git-annex idea more strictly) > - decentralized: allow users to "remove files" by removing the symlink, but > still keep the blob in .git-annex on at least one of the nodes, so that it > can be restored from that. Yes, that's the default behavior if the symlink is removed. There is then a git annex unused pass that can be used to find and remove unused content when space is needed. Given the size of modern drives, that could be run nightly or something. -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, Apr 3, 2011 at 13:18, Rene Mayrhofer wrote: > I've also been thinking about transparent encryption for git/git-annex/bup > backends, but this is not even in a real design phase yet. If anybody is > interested in discussing the issues involved with backing up to a > potentially untrusted repository server, I'm more than happy to start with > getting use cases together as a first step towards integrating encryption. I know Joey pondered this as well, you will find some references on git-annex' ikiwiki. This is needed for S3 in the medium term, anyway. Basically, the plan is to encrypt the files with a symmetric key and then allow access to that key via other keys. That way, you can share some files between machines/people and still make sure no one gets at stuff they shouldn't. The way to encrypt object files' names is still somewhat open to discussion, afaik. Classical dilemma: Where should this be discussed? On this list or within the ikiwiki? Maybe everyone interested should read through the ikiwiki and after some discussion here, we can dump use cases, design decisions etc back into ikiwiki as a TODO once Joey is happy with it? Richard ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sun, Apr 3, 2011 at 11:35, Dieter Plaetinck wrote: > - centralized: have 1 (or more) remotes that always keep a copy of the files > which are being removed on all other remotes, these would be backup-nodes, > they don't follow the strict "always in sync" rule that applies to the > regular nodes. (they follow the original git-annex idea more strictly) FWIW, there has been talk about using bup as a storage back-end for git-annex. That would allow you to keep full revision history and all files in one or two main locations and just use plain git-annex on the other ones. > - decentralized: allow users to "remove files" by removing the symlink, but > still keep the blob in .git-annex on at least one of the nodes, so that it > can be restored from that. Leaving a stale object in the store that no one really knows about seems like an extremely bad idea. And even if git-annex were able to track its existence internally while hiding the symlink from the user, I fear this would cause confusion. I would prefer a way to properly delete a file from all repos, but the bup-backed one would obviously still keep everything around. Of course, you wouldn't need the bup back-end for your podcasts, but for photos or other important personal data, it would be useful. Richard ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Sat, 2 Apr 2011 23:19:52 -0400 Joey Hess wrote: > Dieter Plaetinck wrote: > > @Joey: you mentioned you think inotify might be a better > > backend/paradigm for this than fuse, so do you think implementing > > git-annex in something like dvcs-autosync is feasible? and/or > > preferable? > > Feasable? Certianly. Preferable? I'm in the "let a thousand flowers > bloom phase". It's spring. :) > > As Christophe-Marie has pointed out, git-annex makes annexed files > semi-immutable, and FUSE can hide that quirk, while inotify watching cannot. > That could be confusing for certian users or use cases, if they are not > aware of what is going on. Or it could be something quickly learned > about how these special replicated directories work, that files have to > be copied to be changed. > > This is also an area I hope to improve in git-annex, by using git smudge > filters. So it might get a mode where files can be modified and git > commit just annexes the new content. Last time I looked at this, git was > not *quite* there to let it be done efficiently. I think having support for this in git-annex would be very useful, even if it's not that efficient: if this can be dealt with in git-annex, individual "higherlevel" projects like sharebox and dvcs-autosync have less headaches. Not to mention sharebox/dvcs-autosync would need to do really inefficient things to deal with it anyway. (because they can't involve themselves into the actual git/dvcs tricks, they work on a higher level of abstraction), and it might also benefit some users who work with git-annex manually. How do you see this? How hard/cumbersome is it to implement this in git-annex? Why is it inefficient? It's not really clear to me after reading the smudge information on http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html > > I quite like dvcs-autosync (partially because inotify is more simple > > than fuse, partially because it currently works already quite well) and I'm > > interested in making it support space efficient storage of big files; > > from what I've read it should be possible to do this with git-annex > > (which should not even change how we currently deal with small files, > > they would still be in git) but I'm still doing my first baby steps > > with git-annex so I wouldn't know. Advice very welcome.. > > All it probably needs at is simplest is something like this > (excuse the haskell): > > toobig <- checkFileSize file > if toobig > then git_annex_add file > else git_add file > git_commit file unfortunately I don't think so: - with dvcs-autosync we often commit "early", as in, the file could still be in the process of being written to, or it could be modified again after we added it. >From what I understand, we would need to forbid our users from changing the >file after it is added to git-annex, and worse: if git-annex does its "move >file, replace file with symlink" trick, while the user is writing to it, this >might break things. - when a remote A pulls in the changes from remote B, for dropbox-like behavior it should also automatically: * run `git annex get` * git commit .git-annex/*/*.log Does this seem about right? - deletes will also need to propagate automatically (see next paragraph), still need to figure out how to do that best. > > > Another note : files being tracked with git-annex through sharebox or > > dvcs-autosync or whatever should always have at least 1 "backup copy", > > so that if the file gets deleted everywhere, it still can be retrieved > > from somewhere (which raises the interesting question: where will you > > store this backup copy? introducing a node/repository which will hold > > backup copies can be considered going to a centralized model; which is > > something you (Christophe-Marie) try to explicitly avoid, but I think > > this is not necessarily a problem) > > This is something git annex goes to large lengths to deal with. > It will enforce N backup copies; it tracks which other repositories > have which files; it can transfer wanted file contents from other > repositories in either a decentralized or a centralized manner; the > other repositories can be on other drives of the same computer, or > accessible by ssh, or even, now, Amazon S3. > Note that dropbox-like behavior is different from the behavior you usually expect from git-annex users. * usual git-annex behavior: every remote stands on it's own, there is no forced "being in sync", so that deletes must happen as initiated by the user, and this way you can prevent them from removing files if you expect it could be the last instance of the file. * dropbox-like : remote A remove a file -> *all other remotes* should remove the file, so that their "working copy" looks the same. BUT the file should still be available *somewhere* so that a restore can be initiated (preferably from any of these nodes) I see two solutions here: - centralized: have 1 (or
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
Dieter Plaetinck wrote: > @Joey: you mentioned you think inotify might be a better > backend/paradigm for this than fuse, so do you think implementing > git-annex in something like dvcs-autosync is feasible? and/or > preferable? Feasable? Certianly. Preferable? I'm in the "let a thousand flowers bloom phase". It's spring. :) As Christophe-Marie has pointed out, git-annex makes annexed files semi-immutable, and FUSE can hide that quirk, while inotify watching cannot. That could be confusing for certian users or use cases, if they are not aware of what is going on. Or it could be something quickly learned about how these special replicated directories work, that files have to be copied to be changed. This is also an area I hope to improve in git-annex, by using git smudge filters. So it might get a mode where files can be modified and git commit just annexes the new content. Last time I looked at this, git was not *quite* there to let it be done efficiently. > I quite like dvcs-autosync (partially because inotify is more simple > than fuse, partially because it currently works already quite well) and I'm > interested in making it support space efficient storage of big files; > from what I've read it should be possible to do this with git-annex > (which should not even change how we currently deal with small files, > they would still be in git) but I'm still doing my first baby steps > with git-annex so I wouldn't know. Advice very welcome.. All it probably needs at is simplest is something like this (excuse the haskell): toobig <- checkFileSize file if toobig then git_annex_add file else git_add file git_commit file > Another note : files being tracked with git-annex through sharebox or > dvcs-autosync or whatever should always have at least 1 "backup copy", > so that if the file gets deleted everywhere, it still can be retrieved > from somewhere (which raises the interesting question: where will you > store this backup copy? introducing a node/repository which will hold > backup copies can be considered going to a centralized model; which is > something you (Christophe-Marie) try to explicitly avoid, but I think > this is not necessarily a problem) This is something git annex goes to large lengths to deal with. It will enforce N backup copies; it tracks which other repositories have which files; it can transfer wanted file contents from other repositories in either a decentralized or a centralized manner; the other repositories can be on other drives of the same computer, or accessible by ssh, or even, now, Amazon S3. -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
I see you include fuse.py - http://code.google.com/p/fusepy/ - in your repo. how does it compare to fuse-python - http://pypi.python.org/pypi/fuse-python ? @Joey: you mentioned you think inotify might be a better backend/paradigm for this than fuse, so do you think implementing git-annex in something like dvcs-autosync is feasible? and/or preferable? Ultimately, I think we're all looking for the same: dropbox-like, foss, distributed, git-based (or support for git), elegant and suited for different use cases (whether a handful of text files, or a bunch of huge binary files, or a combination thereof). I quite like dvcs-autosync (partially because inotify is more simple than fuse, partially because it currently works already quite well) and I'm interested in making it support space efficient storage of big files; from what I've read it should be possible to do this with git-annex (which should not even change how we currently deal with small files, they would still be in git) but I'm still doing my first baby steps with git-annex so I wouldn't know. Advice very welcome.. Another note : files being tracked with git-annex through sharebox or dvcs-autosync or whatever should always have at least 1 "backup copy", so that if the file gets deleted everywhere, it still can be retrieved from somewhere (which raises the interesting question: where will you store this backup copy? introducing a node/repository which will hold backup copies can be considered going to a centralized model; which is something you (Christophe-Marie) try to explicitly avoid, but I think this is not necessarily a problem) Dieter ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Thu, Mar 31, 2011 at 8:04 PM, Dieter Plaetinck wrote: > you also need to do various git/git-annex commands, or am I missing something? Ideally, that would be only at set up time. > I quite like dvcs-autosync, but it indeed lacks space-efficient storage of > big files. > I would like to try if we can use git-annex to support this in dvcs-autosync, > although AFAIK git-annex is not transparent in the way regular git is > transparent (i.e. it needs to explicitly copy files between locations), I > assume this is the reason you need to go for a FUSE-based approach? or do you > just prefer this over regular fs + inotify? I don't really like FUSE, and I would actually prefer using inotify, but I think it would not be transparent enough. I think a filesystem is the right abstraction here. > you actually tried coda? it's something I'm interested in, on paper it looks > like an awesome, maybe-even-perfect open source dropbox-clone but the reality > is probably different, I never tried it so I wouldn't know. I did not try it, but I looked at the documentation. It is not purely decentralized: some machines are servers, others are clients and the roles stay the same (If I believe this page: http://www.coda.cs.cmu.edu/ljpaper/lj.html). > hmm, writing files is i/o-bound, I doubt the language will have much effect > here. > check with top/vmstat if you get iowait, if so your storage medium is getting > saturated and rewriting in C won't help. maybe a network/buffering/.. issue. I'll have a look. Actually to come to this conclusion, I used the loopback-fs provided by fusepy, which just mirrors another part of your file system, and I timed the copy of an iso. This copy was 10 times slower than on a real fs (60 seconds instead of 6). I concluded that this was due to python. I have about the same performance on my filesystem. I'll complete the experiment tomorrow with fuse_xmp, which is another fuse loopback-fs, but done in C. > in your REAMDE you suggest to use a crontab for synchronisation; maybe you > can reuse/be inspired by the xmpp system dvcs-autosync uses; it works quite > well, it's quite robust and it's instant :) Yes. I had a 'sync=xx' option, for specifying an interval time between synchronisations, but I removed it for this very reason. ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
On Thu, 31 Mar 2011 18:56:54 +0200 Christophe-Marie Duquesne wrote: > Hi, > > I am currently writing a FUSE file system based on git-annex for > replicating binary files on several machines. I thought I could share > it here in order to get some ideas and contributors. > > What are your goals? > Seamless synchronization "à la dropbox". > Ability to use with big binary files such as mp3/movies. > Entirely decentralized. > Don't use unnecessary space > Keep it simple: avoid special VCS commands and keep a filesystem > interface as much as possible. you also need to do various git/git-annex commands, or am I missing something? > Why? > Because sparkleshare and dvcs-autosync are bad at versioning binary files I quite like dvcs-autosync, but it indeed lacks space-efficient storage of big files. I would like to try if we can use git-annex to support this in dvcs-autosync, although AFAIK git-annex is not transparent in the way regular git is transparent (i.e. it needs to explicitly copy files between locations), I assume this is the reason you need to go for a FUSE-based approach? or do you just prefer this over regular fs + inotify? > Because Unison needs disk space for each couple of hosts it > synchronizes and thus does not really scales for more than 2 hosts > Because Coda is not completely decentralized and it bothers me you actually tried coda? it's something I'm interested in, on paper it looks like an awesome, maybe-even-perfect open source dropbox-clone but the reality is probably different, I never tried it so I wouldn't know. > What do you have? > A python implementation. It is about 600 sloc, and you'll find it on > https://github.com/chmduquesne/sharebox > Be careful, it is very alpha and it still does not have a proper > conflict handler. > > Hey, but copying is slow! > On my machine, copying files to a sharebox fs is about 10 times slower > than copying it on a normal fs. All the time is spent in python's > os.write(): I guess the only way to work around this problem is to > rewrite the whole thing in C, but I am keeping this for later. hmm, writing files is i/o-bound, I doubt the language will have much effect here. check with top/vmstat if you get iowait, if so your storage medium is getting saturated and rewriting in C won't help. maybe a network/buffering/.. issue. > I am interested in: > - suggestions for the functional design (I have my ideas, but I'd love > to be challenged). in your REAMDE you suggest to use a crontab for synchronisation; maybe you can reuse/be inspired by the xmpp system dvcs-autosync uses; it works quite well, it's quite robust and it's instant :) Dieter ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: [announce] Sharebox, a FUSE filesystem relying on git-annex
Christophe-Marie Duquesne wrote: > I am currently writing a FUSE file system based on git-annex for > replicating binary files on several machines. I thought I could share > it here in order to get some ideas and contributors. Wow, you have completely anticipated a blog post I was gonna make in a few days that a) announces git-annex's support for using Amazon S3 as a git "remote", and b) suggests that a free, distributed dropbox-type thing could be built on this foundation. My day, no, my week, is officially made. This is close enough to my birthday that you are in the running for best birthday present. :) > What are your goals? > Seamless synchronization "à la dropbox". > Ability to use with big binary files such as mp3/movies. > Entirely decentralized. > Don't use unnecessary space > Keep it simple: avoid special VCS commands and keep a filesystem > interface as much as possible. 100% agree with this list, although I think that explicitly not mentioning what kind of large binary files a tool might be used to store is a wise thing. ;) > Why? > Because sparkleshare and dvcs-autosync are bad at versioning binary files I have not looked at sparkleshare, but have been wondering if it could be adapted to be used as a GUI frontend for git annex. > What do you have? > A python implementation. It is about 600 sloc, and you'll find it on > https://github.com/chmduquesne/sharebox > Be careful, it is very alpha and it still does not have a proper > conflict handler. > > Hey, but copying is slow! > On my machine, copying files to a sharebox fs is about 10 times slower > than copying it on a normal fs. All the time is spent in python's > os.write(): I guess the only way to work around this problem is to > rewrite the whole thing in C, but I am keeping this for later. I do wonder if a FUSE filesystem is really the best approach. Even a tight C implementation will need to read/write entire file contents to put them into the filesystem. Notice that git-annex avoids doing any copying of large file content when adding a file (it even defaults to using a backend that doesn't checksum, in order to preserve maximum speed). I had been thinking more along the lines of an inotify daemon that watches a directory (like dvcs-autosync), and drives git-annex. One real benefit of a filesystem is that you can support modififying the files, and proxy that through to git-annex as a delete of the old object and an add of the new object. That certainly has vaue -- do you do it? -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
[announce] Sharebox, a FUSE filesystem relying on git-annex
Hi, I am currently writing a FUSE file system based on git-annex for replicating binary files on several machines. I thought I could share it here in order to get some ideas and contributors. What are your goals? Seamless synchronization "à la dropbox". Ability to use with big binary files such as mp3/movies. Entirely decentralized. Don't use unnecessary space Keep it simple: avoid special VCS commands and keep a filesystem interface as much as possible. Why? Because sparkleshare and dvcs-autosync are bad at versioning binary files Because Unison needs disk space for each couple of hosts it synchronizes and thus does not really scales for more than 2 hosts Because Coda is not completely decentralized and it bothers me What do you have? A python implementation. It is about 600 sloc, and you'll find it on https://github.com/chmduquesne/sharebox Be careful, it is very alpha and it still does not have a proper conflict handler. Hey, but copying is slow! On my machine, copying files to a sharebox fs is about 10 times slower than copying it on a normal fs. All the time is spent in python's os.write(): I guess the only way to work around this problem is to rewrite the whole thing in C, but I am keeping this for later. What are your plans? 1) Finish the python implementation and make it stable enough for my everyday use. 2) Switch to C and rewrite the whole thing to make it fast, and backward compatible with the python version. I am interested in: - suggestions for the functional design (I have my ideas, but I'd love to be challenged). - suggestions for the code design Christophe-Marie Duquesne ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home