Re: status of sharebox-fs?

2012-06-05 Thread Joey Hess
Dieter Plaetinck wrote:
> very neat.  looking fwd to it.
> wrt. your notes about dvcs-autosync.  the most frustrating part about it that 
> I've found is that there's some bad race conditions which are basically not 
> solveable
> because of some assumptions git makes.  I did some research myself and tried 
> to explain it in detail @ 
> http://comments.gmane.org/gmane.comp.version-control.home-dir/665

Yes, it's hard to get right, many many possible races to consider.
I'm already using git rm --cached and think I may have an ok rm.

Have been considering manually staging git-annex's symlinks using
git update-index, rather than using git add, to avoid the races
inherent in using it.

A trick I like to use when dealing with races is to intentionally make the
race windows very wide, so it's easy to try racey things. A side benefit
here is that it'll allow coalescing related changes into a single update
of the git index, so going slower will in fact speed things up.

You can check out the "watch" branch in git-annex's git repo to
see what I have so far.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: status of sharebox-fs?

2012-06-05 Thread Dieter Plaetinck
On Wed, 23 May 2012 13:42:39 -0400
Joey Hess  wrote:

> I suppose it would not be out of place to mention the Kickstarter
> project I have just launched today. The first 1/3rd of it will be
> using inotify to automatically add files, and keep a directory in sync.
> http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own

very neat.  looking fwd to it.
wrt. your notes about dvcs-autosync.  the most frustrating part about it that 
I've found is that there's some bad race conditions which are basically not 
solveable
because of some assumptions git makes.  I did some research myself and tried to 
explain it in detail @ 
http://comments.gmane.org/gmane.comp.version-control.home-dir/665


signature.asc
Description: PGP signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: status of sharebox-fs?

2012-05-24 Thread Joey Hess
René Mayrhofer wrote:
> > No, I punted on it. The inotify managed directory will behave
> > differently/annoyingly when the user tries to modify files. This
> > certianly doesn't perfectly cover every use case, but I feel it's an
> > ok tradeoff, you can get used to that behavior.
> There is some event coalescing code towards making file changes less
> annoying in my current dvcs-autosync devel branch (on gitorious).
> However, due to time constraints, I haven't yet gotten around to
> implementing the corner cases that I wanted to for about 4 months
> (Hint: unpacking a big tar file is something I don't want to see being
> split into as many separate git commits as there are files, while I
> would like a commit for every change to a document I am currently
> editing - there's a trade-off in there when only passively monitoring
> the changes via inotify.)
> 
> If you'd like to talk about a few of the issues, I hope to find time to
> do so (life will remain crazy for the next about 4 months). I've been
> playing with the inotify-triggered commits idea for a while and already
> bumped my nose at a few ugly cases that I might assist you in avoiding.

Would absolutely be appreciated.

I have been planning on only committing when it syncs, or something like
that, rather than on every new file. It seems to make sense to commit
every time an existing file is changed, but bundle new files.

As far as I can tell, inotify has an unavoidable race when a new
directory is created -- before the program gets a chance to add a watch
on the directory, files or directories can be added to it, with no
inotify events generated for them. There is a workaround, 
manually recursively scan each new directory after adding the watch.
If you have other gotchas like these, do tell..

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: status of sharebox-fs?

2012-05-24 Thread René Mayrhofer
On 2012-05-23 21:17, Joey Hess wrote:
> No, I punted on it. The inotify managed directory will behave
> differently/annoyingly when the user tries to modify files. This
> certianly doesn't perfectly cover every use case, but I feel it's an
> ok tradeoff, you can get used to that behavior.
There is some event coalescing code towards making file changes less
annoying in my current dvcs-autosync devel branch (on gitorious).
However, due to time constraints, I haven't yet gotten around to
implementing the corner cases that I wanted to for about 4 months
(Hint: unpacking a big tar file is something I don't want to see being
split into as many separate git commits as there are files, while I
would like a commit for every change to a document I am currently
editing - there's a trade-off in there when only passively monitoring
the changes via inotify.)

If you'd like to talk about a few of the issues, I hope to find time to
do so (life will remain crazy for the next about 4 months). I've been
playing with the inotify-triggered commits idea for a while and already
bumped my nose at a few ugly cases that I might assist you in avoiding.

best regards,
Rene

___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: status of sharebox-fs?

2012-05-23 Thread Joey Hess
Christophe-Marie Duquesne wrote:
> On Wed, May 23, 2012 at 9:17 PM, Joey Hess  wrote:
> > Well, imagine there are two remotes, and they're both on the other side
> > of a cable modem. Ideally it should avoid sending the data to both, if
> > one can talk to the other. OTOH, if they're not otherwise connected,
> > it needs to double the transfer. You need to map the network (which
> > git-annex can already do) and analize the weighted graph.
> 
> I might be wrong, but it sounds like a minimum spanning tree problem.

Good observation. I wish I'd formally studied graph theory. :)
However, this is actually a directed graph.. so according to Wikipedia
the Arborescence problem is involved, and
http://en.wikipedia.org/wiki/Chu%E2%80%93Liu/Edmonds_algorithm could be
used, although that throws away edges that cause loops, which does not
seem ideal.

But I suspect this is still oversimplified when it gets down to the real
world... :)

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: status of sharebox-fs?

2012-05-23 Thread Christophe-Marie Duquesne
On Wed, May 23, 2012 at 9:17 PM, Joey Hess  wrote:
> Well, imagine there are two remotes, and they're both on the other side
> of a cable modem. Ideally it should avoid sending the data to both, if
> one can talk to the other. OTOH, if they're not otherwise connected,
> it needs to double the transfer. You need to map the network (which
> git-annex can already do) and analize the weighted graph.

I might be wrong, but it sounds like a minimum spanning tree problem.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: status of sharebox-fs?

2012-05-23 Thread Joey Hess
Christophe-Marie Duquesne wrote:
> No, I think you're refering to dvcs-autosync. Sharebox does not do
> that: instead, the plan is to offer a callback option for a program to
> be called whenever a git operation has been performed. The
> synchronisation itself should be called by touching a special file.
> This way, those who want instant updates might just plug for example a
> jabber bot that would advertise for modifications and touch this file
> whenever another peer advertises for modifications. Those who prefer
> to synchronise once a day in order to save battery should just use a
> crontask.

Ah indeed I'd forgotten about dvcs-autosync.

I've also been thinking about using special files to control various
operations.

> > NAT traversal is also a hard problem, when none of the repos have a public
> > IP address.
> 
> Yes, this is not an easy problem. I also thought about this issue. My
> conclusion: I think it should be easier to pretend every peers are on
> the same LAN, and build a vpn between them. Several possibilities:
> - Users could just self host their own vpn (requires knowledge).
> - Users could suscribe to a service that would provide the vpn (a way
> to make money with such a software?).
> - Some peer to peer vpn could be used. There are plenty of opensource
> programs for this job. gnunet provides a fine way to do that.
> My only problem with all the peer to peer vpn implementations is that,
> afaik, none of them works on every platform.

git-annex special remotes are one way to do it, if the user has an
account that can be used to interchange the data. It has to be made easy
enough to set up that a regular user can do it though.

> > Also interesting is finding the paths through the network
> > of repos that gets data transferred to each most efficiently.
> 
> What do you mean exactly?

Well, imagine there are two remotes, and they're both on the other side
of a cable modem. Ideally it should avoid sending the data to both, if
one can talk to the other. OTOH, if they're not otherwise connected,
it needs to double the transfer. You need to map the network (which
git-annex can already do) and analize the weighted graph.

> Btw, I did not look closely how git-annex has been modified recently,
> but the reason why I have chosen FUSE was the ability to present
> git-annexed files (ie symlinks to read only files) as regular files.
> IMHO, that would be the most user friendly way to manipulate these
> files without the user noticing git. However, from what I read on
> kickstarter, you might have found a better solution. What is it?

No, I punted on it. The inotify managed directory will behave
differently/annoyingly when the user tries to modify files.
This certianly doesn't perfectly cover every use case, but
I feel it's an ok tradeoff, you can get used to that behavior.

One potential use of special files is touching $file.edit to make
git-annex unlock $foo for editing, or perhaps touching .edit to unlock a
whole directory of files.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: status of sharebox-fs?

2012-05-23 Thread Christophe-Marie Duquesne
On Wed, May 23, 2012 at 7:42 PM, Joey Hess  wrote:
> Right now I am thinking about good ways to approach the distributed
> syncing. IIRC the original sharebox did something neat with XMPP to
> broadcast change notifications, avoiding polling, but adding complexity.

No, I think you're refering to dvcs-autosync. Sharebox does not do
that: instead, the plan is to offer a callback option for a program to
be called whenever a git operation has been performed. The
synchronisation itself should be called by touching a special file.
This way, those who want instant updates might just plug for example a
jabber bot that would advertise for modifications and touch this file
whenever another peer advertises for modifications. Those who prefer
to synchronise once a day in order to save battery should just use a
crontask.

> NAT traversal is also a hard problem, when none of the repos have a public
> IP address.

Yes, this is not an easy problem. I also thought about this issue. My
conclusion: I think it should be easier to pretend every peers are on
the same LAN, and build a vpn between them. Several possibilities:
- Users could just self host their own vpn (requires knowledge).
- Users could suscribe to a service that would provide the vpn (a way
to make money with such a software?).
- Some peer to peer vpn could be used. There are plenty of opensource
programs for this job. gnunet provides a fine way to do that.
My only problem with all the peer to peer vpn implementations is that,
afaik, none of them works on every platform.

> Also interesting is finding the paths through the network
> of repos that gets data transferred to each most efficiently.

What do you mean exactly?

Btw, I did not look closely how git-annex has been modified recently,
but the reason why I have chosen FUSE was the ability to present
git-annexed files (ie symlinks to read only files) as regular files.
IMHO, that would be the most user friendly way to manipulate these
files without the user noticing git. However, from what I read on
kickstarter, you might have found a better solution. What is it?

Also, you might want to have a look at aerofs [1]. Anyone who sees
interest in a peer to peer filesystem might enjoy reading this.

[1]: http://www.aerofs.com/
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: status of sharebox-fs?

2012-05-23 Thread Joey Hess
Christophe-Marie Duquesne wrote:
> It's not. It will be some day :)
> 
> Due to heavy workload, I had to stop working on sharebox-fs. Expect
> the development to go on after november (right now I have to focus on
> writing my thesis).
> 
> Don't worry, the development will go on: I absolutely want to finish
> writing this piece of software, I have spent too much time thinking
> about it :)

I had a look at it the other day. Got it basically working, but never
saw it actually git-annex add any files that I added.

I am doubtful about using FUSE though, it makes it harder to set up and
adds overhead whenever the large files are read, which is not ideal,
and means you really do need to use C to keep the overhead as small as
possible, but using C is not ideal for other reasons. :)

I suppose it would not be out of place to mention the Kickstarter
project I have just launched today. The first 1/3rd of it will be
using inotify to automatically add files, and keep a directory in sync.
http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own

Right now I am thinking about good ways to approach the distributed
syncing. IIRC the original sharebox did something neat with XMPP to
broadcast change notifications, avoiding polling, but adding complexity.
NAT traversal is also a hard problem, when none of the repos have a public
IP address. Also interesting is finding the paths through the network
of repos that gets data transferred to each most efficiently.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: status of sharebox-fs?

2012-05-23 Thread Christophe-Marie Duquesne
On Wed, May 16, 2012 at 8:14 PM, Dieter Plaetinck  wrote:
> Remember sharebox? 
> (http://lists.madduck.net/pipermail/vcs-home/2011-March/000351.html)
>
> Seems like a rewrite in C is going on : 
> https://github.com/chmduquesne/sharebox-fs
>
> Did anyone try it? What's your experience with it?
> It looks super awesome, but is it?

It's not. It will be some day :)

Due to heavy workload, I had to stop working on sharebox-fs. Expect
the development to go on after november (right now I have to focus on
writing my thesis).

Don't worry, the development will go on: I absolutely want to finish
writing this piece of software, I have spent too much time thinking
about it :)
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home