Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-06-12 Thread Harley J Pig
Thank you for taking the time to do this.

On Sun, Jun 12, 2011 at 11:52, Richard Hartmann
 wrote:
> * no use strict in validate_metadata?

How embarassing!  Thank you for pointing that out.  I have no idea how
that happened.  I *always* start writing my perl scripts with 'use
strict; use warnings;' ... excuses excuses

> * validate_metadata uses -e, not -f to check of .metafile exists

Good point, fixed.

> * .metafile seems to reside in $GIT_WORK_TREE/.git/hooks, not in 
> $GIT_WORK_TREE?

I'm not sure where you're seeing this.  build_metafile blindly opens a
file relative to the directory 'git --rev-parse --show-toplevel'
reports.  post-commit calls 'build_metafile .metafile' which will
create .metafile in the git repo toplevel directory.

> * post-commit sets $OLDPWD, but does not use it. Why?

It's a relic of the original project I scavanged from.  This has been fixed.

> * post-merge seems to be buggy and a no-op?

post-merge is, or should be, a symlink to post-commit.

> * post-commit will always clobber all local data. This seems to be unsafe.

I'm not sure what you mean.  .metafile will be overwritten with the
current information but since its being committed to the repository
the changes will always be accessible.  Nothing else is affected.
-- 
Harley J Pig
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-06-12 Thread Richard Hartmann
On Sun, Apr 10, 2011 at 16:43, Harley J Pig  wrote:

> https://github.com/harleypig/gitperms

I have had some time to actually have a quick look at your script,
now. Disclaimer: I didn't run it, yet.

I have a few questions/comments:

* no use strict in validate_metadata?

* validate_metadata uses -e, not -f to check of .metafile exists

* .metafile seems to reside in $GIT_WORK_TREE/.git/hooks, not in $GIT_WORK_TREE?

* post-commit sets $OLDPWD, but does not use it. Why?

* post-merge seems to be buggy and a no-op?

* post-commit will always clobber all local data. This seems to be unsafe.


As I said, I only had time for a quick glance, not a full test; sorry.



Richard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Harley J Pig
On Sun, Apr 10, 2011 at 18:12, Richard Hartmann
 wrote:
> Done.

Thanks.
-- 
Harley J Pig
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Richard Hartmann
On Mon, Apr 11, 2011 at 02:07, Harley J Pig  wrote:

> I'm not subscribed to that list, go ahead and post it if you would.  Thank 
> you.

Done.


Richard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Harley J Pig
On Sun, Apr 10, 2011 at 09:48, Richard Hartmann
 wrote:
> Are you willing to bounce that onto the git list or should I do so?

I'm not subscribed to that list, go ahead and post it if you would.  Thank you.
-- 
Harley J Pig
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Richard Hartmann
On Sun, Apr 10, 2011 at 16:43, Harley J Pig  wrote:

> You can
> find it at https://github.com/harleypig/gitperms

Are you willing to bounce that onto the git list or should I do so?


RIchard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Harley J Pig
I've written a metastore clone for a project where we need to store a
linux distribution in version control (legacy code).  I'm also using
it for my personal vcs-home stuff.  It is a naive and bluntly
straightforward way to do this, but it seems to be working.  You can
find it at https://github.com/harleypig/gitperms

I use git hooks and a central file to (re)store the metadata.  Maybe
it can be of some use to someone else.
-- 
Harley J Pig
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-09 Thread Richard Hartmann
On Sat, Apr 9, 2011 at 04:42, Christophe-Marie Duquesne
 wrote:

> git-annex does location tracking. Even if you delete the link, the file is
> still there and other repositories know what repositories have the file. If
> you want to be sure the file is always reachable, you have to force a
> repository to act central and to download every files. That is a mount
> option I have already added ( -o getall).

FYI, git-annex gained the ability to use a bup remote. This will solve
all problems in this regard if used correctly and will even give you
indefinite and full history.

As an aside, please look here [1] for a current discussion on how to
store metadata in git, enabling git-annex to do so, enabling any FUSE
front-ends to act more in line with normal file systems. Smudge
filters were mentioned so this must be good ;)


Richard


[1] http://marc.info/?l=git&m=130220380412726&w=4
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-08 Thread Christophe-Marie Duquesne
I'll try to gather things I can answer to:

I see you include fuse.py - http://code.google.com/p/fusepy/ - in your repo.
> how does it compare to fuse-python -
> http://pypi.python.org/pypi/fuse-python ?
>

fusepy is written with ctypes while fuse-python is a full-blown C extension.
At first, I was using fuse-python, but I ended thinking fusepy was less
bloated and less painful (just a file to include versus a library to compile
and install).

where will you store this backup copy? introducing a node/repository which
> will hold backup copies can be considered going to a centralized model;
> which is something you (Christophe-Marie) try to explicitly avoid, but I
> think this is not necessarily a problem)
>

git-annex does location tracking. Even if you delete the link, the file is
still there and other repositories know what repositories have the file. If
you want to be sure the file is always reachable, you have to force a
repository to act central and to download every files. That is a mount
option I have already added ( -o getall).

This is also an area I hope to improve in git-annex, by using git smudge
> filters. So it might get a mode where files can be modified and git
> commit just annexes the new content.


That would be great. I am not sure using fuse would still be necessary,
then.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-05 Thread Christophe-Marie Duquesne
Hi

I see there have been some good thoughts given about this. I am
currently on vacation in a place where I do not have internet access.
I'll come back to you in a week.

Regards,
Christophe-Marie
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Dieter Plaetinck
On Sun, 3 Apr 2011 11:18:05 -0400
Joey Hess  wrote:

> Dieter Plaetinck wrote:
> 
> > I think having support for this in git-annex would be very useful,
> > even if it's not that efficient: if this can be dealt with in
> > git-annex, individual "higherlevel" projects like sharebox and
> > dvcs-autosync have less headaches.  Not to mention
> > sharebox/dvcs-autosync would need to do really inefficient things to
> > deal with it anyway. (because they can't involve themselves into the
> > actual git/dvcs tricks, they work on a higher level of abstraction),
> > and it might also benefit some users who work with git-annex manually.
> > How do you see this? How hard/cumbersome is it to implement this in
> > git-annex? Why is it inefficient?  It's not really clear to me after
> > reading the smudge information on
> > http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html
> 
> http://git-annex.branchable.com/todo/smudge/
> 
> > >   if toobig
> > >   then git_annex_add file
> > >   else git_add file
> > >   git_commit file
> > 
> > unfortunately I don't think so:
> > - with dvcs-autosync we often commit "early", as in, the file could still 
> > be in the process of being written to, or it could be modified again after 
> > we added it.
> > From what I understand, we would need to forbid our users from changing the 
> > file after it is added to git-annex, and worse: if git-annex does its "move 
> > file, replace file with symlink" trick, while the user is writing to it, 
> > this might break things.
> 
> You're right. However, you would also not want to commit many partial
> versions of a large file as it was being written.

Well, if it ever happens once, that's once too many.

Since we're aiming for a dropbox-like near-instant-synchronisation system, the 
way of working is different then when using git for , say.. version controlling 
source code. So it _will_ happen that we commit versions of files as they are 
in the progress of being written.  Even if the user decides to store something 
like a continuously being updated logfile in his dropbox-like system, I want to 
be able to support that.


___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Joey Hess
Richard Hartmann wrote:
> I know Joey pondered this as well, you will find some references on
> git-annex' ikiwiki. This is needed for S3 in the medium term, anyway.
> 
> Basically, the plan is to encrypt the files with a symmetric key and
> then allow access to that key via other keys. That way, you can share
> some files between machines/people and still make sure no one gets at
> stuff they shouldn't.
> 
> The way to encrypt object files' names is still somewhat open to
> discussion, afaik.
> 
> 
> Classical dilemma: Where should this be discussed? On this list or
> within the ikiwiki? Maybe everyone interested should read through the
> ikiwiki and after some discussion here, we can dump use cases, design
> decisions etc back into ikiwiki as a TODO once Joey is happy with it?

I've put together my current thoughts at
http://git-annex.branchable.com/design/encryption/
Comments appreciated in any medium (except watercolors).

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Joey Hess
Dieter Plaetinck wrote:

> I think having support for this in git-annex would be very useful,
> even if it's not that efficient: if this can be dealt with in
> git-annex, individual "higherlevel" projects like sharebox and
> dvcs-autosync have less headaches.  Not to mention
> sharebox/dvcs-autosync would need to do really inefficient things to
> deal with it anyway. (because they can't involve themselves into the
> actual git/dvcs tricks, they work on a higher level of abstraction),
> and it might also benefit some users who work with git-annex manually.
> How do you see this? How hard/cumbersome is it to implement this in
> git-annex? Why is it inefficient?  It's not really clear to me after
> reading the smudge information on
> http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html

http://git-annex.branchable.com/todo/smudge/

> > if toobig
> > then git_annex_add file
> > else git_add file
> > git_commit file
> 
> unfortunately I don't think so:
> - with dvcs-autosync we often commit "early", as in, the file could still be 
> in the process of being written to, or it could be modified again after we 
> added it.
> From what I understand, we would need to forbid our users from changing the 
> file after it is added to git-annex, and worse: if git-annex does its "move 
> file, replace file with symlink" trick, while the user is writing to it, this 
> might break things.

You're right. However, you would also not want to commit many partial
versions of a large file as it was being written.

> - when a remote A pulls in the changes from remote B, for dropbox-like 
> behavior it should also automatically:
>  * run `git annex get`
>  * git commit .git-annex/*/*.log
> Does this seem about right?

Yes.

> - deletes will also need to propagate automatically (see next paragraph), 
> still need to figure out how to do that best.
> Note that dropbox-like behavior is different from the behavior you usually 
> expect from git-annex users.
> * usual git-annex behavior: every remote stands on it's own, there is no 
> forced "being in sync", so that deletes must happen as initiated by the user, 
> and this way you can prevent them from removing files if you expect it could 
> be the last instance of the file.
> * dropbox-like : remote A remove a file -> *all other remotes* should remove 
> the file, so that their "working copy" looks the same. BUT the file should 
> still be available *somewhere* so that a restore can be initiated (preferably 
> from any of these nodes)
> 
> I see two solutions here:
> - centralized: have 1 (or more) remotes that always keep a copy of the files 
> which are being removed on all other remotes, these would be backup-nodes, 
> they don't follow the strict "always in sync" rule that applies to the 
> regular nodes. (they follow the original git-annex idea more strictly)
> - decentralized: allow users to "remove files" by removing the symlink, but 
> still keep the blob in .git-annex on at least one of the nodes, so that it 
> can be restored from that.

Yes, that's the default behavior if the symlink is removed. There is
then a git annex unused pass that can be used to find and remove unused
content when space is needed. Given the size of modern drives, that
could be run nightly or something.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Richard Hartmann
On Sun, Apr 3, 2011 at 13:18, Rene Mayrhofer  wrote:

> I've also been thinking about transparent encryption for git/git-annex/bup
> backends, but this is not even in a real design phase yet. If anybody is
> interested in discussing the issues involved with backing up to a
> potentially untrusted repository server, I'm more than happy to start with
> getting use cases together as a first step towards integrating encryption.

I know Joey pondered this as well, you will find some references on
git-annex' ikiwiki. This is needed for S3 in the medium term, anyway.

Basically, the plan is to encrypt the files with a symmetric key and
then allow access to that key via other keys. That way, you can share
some files between machines/people and still make sure no one gets at
stuff they shouldn't.

The way to encrypt object files' names is still somewhat open to
discussion, afaik.


Classical dilemma: Where should this be discussed? On this list or
within the ikiwiki? Maybe everyone interested should read through the
ikiwiki and after some discussion here, we can dump use cases, design
decisions etc back into ikiwiki as a TODO once Joey is happy with it?


Richard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Richard Hartmann
On Sun, Apr 3, 2011 at 11:35, Dieter Plaetinck  wrote:

> - centralized: have 1 (or more) remotes that always keep a copy of the files 
> which are being removed on all other remotes, these would be backup-nodes, 
> they don't follow the strict "always in sync" rule that applies to the 
> regular nodes. (they follow the original git-annex idea more strictly)

FWIW, there has been talk about using bup as a storage back-end for
git-annex. That would allow you to keep full revision history and all
files in one or two main locations and just use plain git-annex on the
other ones.


> - decentralized: allow users to "remove files" by removing the symlink, but 
> still keep the blob in .git-annex on at least one of the nodes, so that it 
> can be restored from that.

Leaving a stale object in the store that no one really knows about
seems like an extremely bad idea. And even if git-annex were able to
track its existence internally while hiding the symlink from the user,
I fear this would cause confusion. I would prefer a way to properly
delete a file from all repos, but the bup-backed one would obviously
still keep everything around. Of course, you wouldn't need the bup
back-end for your podcasts, but for photos or other important personal
data, it would be useful.


Richard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Dieter Plaetinck
On Sat, 2 Apr 2011 23:19:52 -0400
Joey Hess  wrote:

> Dieter Plaetinck wrote:
> > @Joey: you mentioned you think inotify might be a better
> > backend/paradigm for this than fuse, so do you think implementing
> > git-annex in something like dvcs-autosync is feasible? and/or
> > preferable?
> 
> Feasable? Certianly. Preferable? I'm in the "let a thousand flowers
> bloom phase". It's spring. :)
> 
> As Christophe-Marie has pointed out, git-annex makes annexed files
> semi-immutable, and FUSE can hide that quirk, while inotify watching cannot.
> That could be confusing for certian users or use cases, if they are not
> aware of what is going on. Or it could be something quickly learned
> about how these special replicated directories work, that files have to
> be copied to be changed.
> 
> This is also an area I hope to improve in git-annex, by using git smudge
> filters. So it might get a mode where files can be modified and git
> commit just annexes the new content. Last time I looked at this, git was
> not *quite* there to let it be done efficiently.

I think having support for this in git-annex would be very useful, even if it's 
not that efficient: if this can be dealt with in git-annex, individual 
"higherlevel" projects like sharebox and dvcs-autosync have less headaches.  
Not to mention sharebox/dvcs-autosync would need to do really inefficient 
things to deal with it anyway. (because they can't involve themselves into the 
actual git/dvcs tricks, they work on a higher level of abstraction), and it 
might also benefit some users who work with git-annex manually.
How do you see this? How hard/cumbersome is it to implement this in git-annex?
Why is it inefficient?  It's not really clear to me after reading the smudge 
information on 
http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html


> > I quite like dvcs-autosync (partially because inotify is more simple
> > than fuse, partially because it currently works already quite well) and I'm
> > interested in making it support space efficient storage of big files;
> > from what I've read it should be possible to do this with git-annex
> > (which should not even change how we currently deal with small files,
> > they would still be in git) but I'm still doing my first baby steps
> > with git-annex so I wouldn't know. Advice very welcome..
> 
> All it probably needs at is simplest is something like this
> (excuse the haskell):
> 
>   toobig <- checkFileSize file
>   if toobig
>   then git_annex_add file
>   else git_add file
>   git_commit file

unfortunately I don't think so:
- with dvcs-autosync we often commit "early", as in, the file could still be in 
the process of being written to, or it could be modified again after we added 
it.
>From what I understand, we would need to forbid our users from changing the 
>file after it is added to git-annex, and worse: if git-annex does its "move 
>file, replace file with symlink" trick, while the user is writing to it, this 
>might break things.
- when a remote A pulls in the changes from remote B, for dropbox-like behavior 
it should also automatically:
 * run `git annex get`
 * git commit .git-annex/*/*.log
Does this seem about right?
- deletes will also need to propagate automatically (see next paragraph), still 
need to figure out how to do that best.

> 
> > Another note : files being tracked with git-annex through sharebox or
> > dvcs-autosync or whatever should always have at least 1 "backup copy",
> > so that if the file gets deleted everywhere, it still can be retrieved
> > from somewhere (which raises the interesting question: where will you
> > store this backup copy? introducing a node/repository which will hold
> > backup copies can be considered going to a centralized model; which is
> > something you (Christophe-Marie) try to explicitly avoid, but I think
> > this is not necessarily a problem)
> 
> This is something git annex goes to large lengths to deal with.
> It will enforce N backup copies; it tracks which other repositories
> have which files; it can transfer wanted file contents from other
> repositories in either a decentralized or a centralized manner; the
> other repositories can be on other drives of the same computer, or
> accessible by ssh, or even, now, Amazon S3.
> 

Note that dropbox-like behavior is different from the behavior you usually 
expect from git-annex users.
* usual git-annex behavior: every remote stands on it's own, there is no forced 
"being in sync", so that deletes must happen as initiated by the user, and this 
way you can prevent them from removing files if you expect it could be the last 
instance of the file.
* dropbox-like : remote A remove a file -> *all other remotes* should remove 
the file, so that their "working copy" looks the same. BUT the file should 
still be available *somewhere* so that a restore can be initiated (preferably 
from any of these nodes)

I see two solutions here:
- centralized: have 1 (or 

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-02 Thread Joey Hess
Dieter Plaetinck wrote:
> @Joey: you mentioned you think inotify might be a better
> backend/paradigm for this than fuse, so do you think implementing
> git-annex in something like dvcs-autosync is feasible? and/or
> preferable?

Feasable? Certianly. Preferable? I'm in the "let a thousand flowers
bloom phase". It's spring. :)

As Christophe-Marie has pointed out, git-annex makes annexed files
semi-immutable, and FUSE can hide that quirk, while inotify watching cannot.
That could be confusing for certian users or use cases, if they are not
aware of what is going on. Or it could be something quickly learned
about how these special replicated directories work, that files have to
be copied to be changed.

This is also an area I hope to improve in git-annex, by using git smudge
filters. So it might get a mode where files can be modified and git
commit just annexes the new content. Last time I looked at this, git was
not *quite* there to let it be done efficiently.

> I quite like dvcs-autosync (partially because inotify is more simple
> than fuse, partially because it currently works already quite well) and I'm
> interested in making it support space efficient storage of big files;
> from what I've read it should be possible to do this with git-annex
> (which should not even change how we currently deal with small files,
> they would still be in git) but I'm still doing my first baby steps
> with git-annex so I wouldn't know. Advice very welcome..

All it probably needs at is simplest is something like this
(excuse the haskell):

toobig <- checkFileSize file
if toobig
then git_annex_add file
else git_add file
git_commit file

> Another note : files being tracked with git-annex through sharebox or
> dvcs-autosync or whatever should always have at least 1 "backup copy",
> so that if the file gets deleted everywhere, it still can be retrieved
> from somewhere (which raises the interesting question: where will you
> store this backup copy? introducing a node/repository which will hold
> backup copies can be considered going to a centralized model; which is
> something you (Christophe-Marie) try to explicitly avoid, but I think
> this is not necessarily a problem)

This is something git annex goes to large lengths to deal with.
It will enforce N backup copies; it tracks which other repositories
have which files; it can transfer wanted file contents from other
repositories in either a decentralized or a centralized manner; the
other repositories can be on other drives of the same computer, or
accessible by ssh, or even, now, Amazon S3.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-02 Thread Dieter Plaetinck
I see you include fuse.py - http://code.google.com/p/fusepy/ - in your repo.
how does it compare to fuse-python - http://pypi.python.org/pypi/fuse-python ?

@Joey: you mentioned you think inotify might be a better backend/paradigm for 
this than fuse, so do you think implementing git-annex in something like 
dvcs-autosync is feasible? and/or preferable?

Ultimately, I think we're all looking for the same: dropbox-like, foss, 
distributed, git-based (or support for git), elegant and suited for different 
use cases (whether a handful of text files, or a bunch of huge binary files, or 
a combination thereof).
I quite like dvcs-autosync (partially because inotify is more simple than fuse, 
partially because it currently works already quite well) and I'm interested in 
making it support space efficient storage of big files; from what I've read it 
should be possible to do this with git-annex (which should not even change how 
we currently deal with small files, they would still be in git) but I'm still 
doing my first baby steps with git-annex so I wouldn't know. Advice very 
welcome..

Another note : files being tracked with git-annex through sharebox or 
dvcs-autosync or whatever should always have at least 1 "backup copy", so that 
if the file gets deleted everywhere, it still can be retrieved from somewhere 
(which raises the interesting question: where will you store this backup copy? 
introducing a node/repository which will hold backup copies can be considered 
going to a centralized model; which is something you (Christophe-Marie) try to 
explicitly avoid, but I think this is not necessarily a problem)

Dieter
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-03-31 Thread Christophe-Marie Duquesne
On Thu, Mar 31, 2011 at 8:04 PM, Dieter Plaetinck  wrote:
> you also need to do various git/git-annex commands, or am I missing something?

Ideally, that would be only at set up time.

> I quite like dvcs-autosync, but it indeed lacks space-efficient storage of 
> big files.
> I would like to try if we can use git-annex to support this in dvcs-autosync, 
> although AFAIK git-annex is not transparent in the way regular git is 
> transparent (i.e. it needs to explicitly copy files between locations), I 
> assume this is the reason you need to go for a FUSE-based approach? or do you 
> just prefer this over regular fs + inotify?

I don't really like FUSE, and I would actually prefer using inotify,
but I think it would not be transparent enough. I think a filesystem
is the right abstraction here.

> you actually tried coda? it's something I'm interested in, on paper it looks 
> like an awesome, maybe-even-perfect open source dropbox-clone but the reality 
> is probably different, I never tried it so I wouldn't know.

I did not try it, but I looked at the documentation. It is not purely
decentralized: some machines are servers, others are clients and the
roles stay the same (If I believe this page:
http://www.coda.cs.cmu.edu/ljpaper/lj.html).

> hmm, writing files is i/o-bound, I doubt the language will have much effect 
> here.
> check with top/vmstat if you get iowait, if so your storage medium is getting 
> saturated and rewriting in C won't help. maybe a network/buffering/.. issue.

I'll have a look. Actually to come to this conclusion, I used the
loopback-fs provided by fusepy, which just mirrors another part of
your file system, and I timed the copy of an iso. This copy was 10
times slower than on a real fs (60 seconds instead of 6). I concluded
that this was due to python. I have about the same performance on my
filesystem. I'll complete the experiment tomorrow with fuse_xmp, which
is another fuse loopback-fs, but done in C.

> in your REAMDE you suggest to use a crontab for synchronisation; maybe you 
> can reuse/be inspired by the xmpp system dvcs-autosync uses; it works quite 
> well, it's quite robust and it's instant :)

Yes. I had a 'sync=xx' option, for specifying an interval time between
synchronisations, but I removed it for this very reason.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-03-31 Thread Dieter Plaetinck
On Thu, 31 Mar 2011 18:56:54 +0200
Christophe-Marie Duquesne  wrote:

> Hi,
> 
> I am currently writing a FUSE file system based on git-annex for
> replicating binary files on several machines. I thought I could share
> it here in order to get some ideas and contributors.
> 
> What are your goals?
> Seamless synchronization "à la dropbox".
> Ability to use with big binary files such as mp3/movies.
> Entirely decentralized.
> Don't use unnecessary space
> Keep it simple: avoid special VCS commands and keep a filesystem
> interface as much as possible.

you also need to do various git/git-annex commands, or am I missing something?
 
> Why?
> Because sparkleshare and dvcs-autosync are bad at versioning binary files

I quite like dvcs-autosync, but it indeed lacks space-efficient storage of big 
files.
I would like to try if we can use git-annex to support this in dvcs-autosync, 
although AFAIK git-annex is not transparent in the way regular git is 
transparent (i.e. it needs to explicitly copy files between locations), I 
assume this is the reason you need to go for a FUSE-based approach? or do you 
just prefer this over regular fs + inotify?

> Because Unison needs disk space for each couple of hosts it
> synchronizes and thus does not really scales for more than 2 hosts
> Because Coda is not completely decentralized and it bothers me

you actually tried coda? it's something I'm interested in, on paper it looks 
like an awesome, maybe-even-perfect open source dropbox-clone but the reality 
is probably different, I never tried it so I wouldn't know.
 
> What do you have?
> A python implementation. It is about 600 sloc, and you'll find it on
> https://github.com/chmduquesne/sharebox
> Be careful, it is very alpha and it still does not have a proper
> conflict handler.
> 
> Hey, but copying is slow!
> On my machine, copying files to a sharebox fs is about 10 times slower
> than copying it on a normal fs. All the time is spent in python's
> os.write(): I guess the only way to work around this problem is to
> rewrite the whole thing in C, but I am keeping this for later.

hmm, writing files is i/o-bound, I doubt the language will have much effect 
here.
check with top/vmstat if you get iowait, if so your storage medium is getting 
saturated and rewriting in C won't help. maybe a network/buffering/.. issue.

> I am interested in:
> - suggestions for the functional design (I have my ideas, but I'd love
> to be challenged).

in your REAMDE you suggest to use a crontab for synchronisation; maybe you can 
reuse/be inspired by the xmpp system dvcs-autosync uses; it works quite well, 
it's quite robust and it's instant :)


Dieter
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-03-31 Thread Joey Hess
Christophe-Marie Duquesne wrote:
> I am currently writing a FUSE file system based on git-annex for
> replicating binary files on several machines. I thought I could share
> it here in order to get some ideas and contributors.

Wow, you have completely anticipated a blog post I was gonna make in a
few days that a) announces git-annex's support for using Amazon S3 as a git
"remote", and b) suggests that a free, distributed dropbox-type thing
could be built on this foundation.

My day, no, my week, is officially made. This is close enough to my
birthday that you are in the running for best birthday present. :)

> What are your goals?
> Seamless synchronization "à la dropbox".
> Ability to use with big binary files such as mp3/movies.
> Entirely decentralized.
> Don't use unnecessary space
> Keep it simple: avoid special VCS commands and keep a filesystem
> interface as much as possible.

100% agree with this list, although I think that explicitly not
mentioning what kind of large binary files a tool might be used to
store is a wise thing. ;)

> Why?
> Because sparkleshare and dvcs-autosync are bad at versioning binary files

I have not looked at sparkleshare, but have been wondering if it could
be adapted to be used as a GUI frontend for git annex.

> What do you have?
> A python implementation. It is about 600 sloc, and you'll find it on
> https://github.com/chmduquesne/sharebox
> Be careful, it is very alpha and it still does not have a proper
> conflict handler.
> 
> Hey, but copying is slow!
> On my machine, copying files to a sharebox fs is about 10 times slower
> than copying it on a normal fs. All the time is spent in python's
> os.write(): I guess the only way to work around this problem is to
> rewrite the whole thing in C, but I am keeping this for later.

I do wonder if a FUSE filesystem is really the best approach. Even a tight
C implementation will need to read/write entire file contents to put
them into the filesystem. Notice that git-annex avoids doing any copying
of large file content when adding a file (it even defaults to using a
backend that doesn't checksum, in order to preserve maximum speed).

I had been thinking more along the lines of an inotify daemon
that watches a directory (like dvcs-autosync), and drives git-annex.

One real benefit of a filesystem is that you can support
modififying the files, and proxy that through to git-annex as a delete
of the old object and an add of the new object. That certainly has vaue
-- do you do it?

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

[announce] Sharebox, a FUSE filesystem relying on git-annex

2011-03-31 Thread Christophe-Marie Duquesne
Hi,

I am currently writing a FUSE file system based on git-annex for
replicating binary files on several machines. I thought I could share
it here in order to get some ideas and contributors.

What are your goals?
Seamless synchronization "à la dropbox".
Ability to use with big binary files such as mp3/movies.
Entirely decentralized.
Don't use unnecessary space
Keep it simple: avoid special VCS commands and keep a filesystem
interface as much as possible.

Why?
Because sparkleshare and dvcs-autosync are bad at versioning binary files
Because Unison needs disk space for each couple of hosts it
synchronizes and thus does not really scales for more than 2 hosts
Because Coda is not completely decentralized and it bothers me

What do you have?
A python implementation. It is about 600 sloc, and you'll find it on
https://github.com/chmduquesne/sharebox
Be careful, it is very alpha and it still does not have a proper
conflict handler.

Hey, but copying is slow!
On my machine, copying files to a sharebox fs is about 10 times slower
than copying it on a normal fs. All the time is spent in python's
os.write(): I guess the only way to work around this problem is to
rewrite the whole thing in C, but I am keeping this for later.

What are your plans?
1) Finish the python implementation and make it stable enough for my
everyday use.
2) Switch to C and rewrite the whole thing to make it fast, and
backward compatible with the python version.

I am interested in:
- suggestions for the functional design (I have my ideas, but I'd love
to be challenged).
- suggestions for the code design

Christophe-Marie Duquesne
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home