Re: [RFC 1/5] GSOC: prepare svndump for branch detection

2012-08-20 Thread Florian Achleitner
On Monday 20 August 2012 09:45:30 Jonathan Nieder wrote:
> Florian Achleitner wrote:
> > Currently, the mark number is equal to the svn revision number the commit
> > corresponds to. I didn't want to break that, but not mandatory. We could
> > also split the mark namespace by reserving one or more of the most
> > significant bits as a type specifier.
> > I'll develop a marks-based version ..
> 
> Have we already exhausted possibilities that don't involve changing
> vcs-svn/ code quite so much?  One possibility mentioned before was to
> post-process the stream that svn-fe produces, which seemed appealing
> from a debuggability point of view.
> 

Do you mean like another program in the pipe, that translates the fast-import 
stream produced by svn-fe into another fast-import stream?
svnrdump | svn-fe | svnbranchdetect | git-fast-import ?

My two previous ideas were meant like this:
1. Import everything into git and detect branches on the stuff in git, or
2. detect branches as it imports.

Both require to create commits for their work. So the idea behind these 
patches is to split the creation of commits from the creation of data. So that 
the data can be sent immediatly as it is coming in from svnrdump, and 
therefore save memory by not buffering it. 

And create the commits later. Either all linear and splitting it into branches 
later which requires creating commits but not data, or creating branched 
commits immediatly. This requires to inspect all  node data before starting a 
commit.

Anyways it's just an idea..

> Curious,
> Jonathan

Hope that helps,
Florian
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/5] GSOC: prepare svndump for branch detection

2012-08-20 Thread Jonathan Nieder
Florian Achleitner wrote:

> Currently, the mark number is equal to the svn revision number the commit
> corresponds to. I didn't want to break that, but not mandatory. We could also
> split the mark namespace by reserving one or more of the most significant bits
> as a type specifier.
> I'll develop a marks-based version ..

Have we already exhausted possibilities that don't involve changing
vcs-svn/ code quite so much?  One possibility mentioned before was to
post-process the stream that svn-fe produces, which seemed appealing
from a debuggability point of view.

Curious,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/5] GSOC: prepare svndump for branch detection

2012-08-20 Thread Florian Achleitner
On Sunday 19 August 2012 23:57:23 Junio C Hamano wrote:
> Florian Achleitner  writes:
> >> This change makes me uncomfortable.
> >> We are doubling up on hashing with fast-import.
> >> This introduces git-specific logic into vcs-svn.
> 
> IIUC, vcs-svn/fast-export is meant to produce a stream in the
> fast-import format, and that format is meant to be VCS agnostic,
> it would need a careful thinking to add anything Git specific to
> it.  If you make other people's importers unable to read from you
> because you tell them the contents of blob in Git's terms, that is
> not very good.

Good point.

> 
> > You have two choices of referencing that blobs later, by using a mark, or
> > by giving their sha1. Marks are already used for marking commits, and
> > there is only one "mark namespace". So I couldn't use marks to reference
> > the blobs in a nice way. This allows for referencing them by their sha1.
> 
> Surely you can, by using even and odd numbers (or modulo 4 if you
> may later want to mark trees and tags as well, but I doubt that is
> needed), no?

Currently, the mark number is equal to the svn revision number the commit 
corresponds to. I didn't want to break that, but not mandatory. We could also 
split the mark namespace by reserving one or more of the most significant bits 
as a type specifier. 
I'll develop a marks-based version ..


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/5] GSOC: prepare svndump for branch detection

2012-08-19 Thread Junio C Hamano
Florian Achleitner  writes:

>> This change makes me uncomfortable.
>> We are doubling up on hashing with fast-import.
>> This introduces git-specific logic into vcs-svn.

IIUC, vcs-svn/fast-export is meant to produce a stream in the
fast-import format, and that format is meant to be VCS agnostic,
it would need a careful thinking to add anything Git specific to
it.  If you make other people's importers unable to read from you
because you tell them the contents of blob in Git's terms, that is
not very good.

> You have two choices of referencing that blobs later, by using a mark, or by 
> giving their sha1. Marks are already used for marking commits, and there is
> only one "mark namespace". So I couldn't use marks to reference the blobs in  
> a nice way. This allows for referencing them by their sha1.

Surely you can, by using even and odd numbers (or modulo 4 if you
may later want to mark trees and tags as well, but I doubt that is
needed), no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/5] GSOC: prepare svndump for branch detection

2012-08-19 Thread Florian Achleitner
On Sunday 19 August 2012 04:37:35 David Michael Barr wrote:
> On Sat, Aug 18, 2012 at 6:40 AM, Florian Achleitner
> 
>  wrote:
> > Hi!
> > 
> > This patch series should prepare vcs-svn/svndump.* for branch
> > detection. When starting with this feature I found that the existing
> > functions are not yet appropriate for that.
> > These rewrites the node handling part of svndump.c, it is very
> > invasive. The logic in handle_node is not simple, I hope that I
> > understood every case the existing code tries to adress.
> > At least it doesn't break an existing testcase.
> > 
> > The series applies on top of:
> > [PATCH/RFC v4 16/16] Add a test script for remote-svn.
> > I could also rebase it onto master if you think it makes sense.
> > 
> > Florian
> > 
> >  [RFC 1/5] vcs-svn: Add sha1 calculaton to fast_export and
> 
> This change makes me uncomfortable.
> We are doubling up on hashing with fast-import.
> This introduces git-specific logic into vcs-svn.

You might need to read the rest of the series to see why I did this.
Short version: For fast-import, I seperated sending data from the commits, it 
is sent using the 'blob' command.
You have two choices of referencing that blobs later, by using a mark, or by 
giving their sha1. Marks are already used for marking commits, and there is 
only one "mark namespace". So I couldn't use marks to reference the blobs in  
a nice way. This allows for referencing them by their sha1.

> 
> >  [RFC 2/5] svndump: move struct definitions to .h.
> >  [RFC 3/5] vcs-svn/svndump: restructure node_ctx, rev_ctx handling
> >  [RFC 4/5] vcs-svn/svndump: rewrite handle_node(),
> >  [RFC 5/5] vcs-svn: remove repo_tree
> 
> I haven't read the rest of the series yet but I expect
> it is less controversial than the first patch.

Hm.. I'm not sure ;)
> 
> --
> David Michael Barr

Florian 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/5] GSOC: prepare svndump for branch detection

2012-08-18 Thread David Michael Barr
On Sat, Aug 18, 2012 at 6:40 AM, Florian Achleitner
 wrote:
> Hi!
>
> This patch series should prepare vcs-svn/svndump.* for branch
> detection. When starting with this feature I found that the existing
> functions are not yet appropriate for that.
> These rewrites the node handling part of svndump.c, it is very
> invasive. The logic in handle_node is not simple, I hope that I
> understood every case the existing code tries to adress.
> At least it doesn't break an existing testcase.
>
> The series applies on top of:
> [PATCH/RFC v4 16/16] Add a test script for remote-svn.
> I could also rebase it onto master if you think it makes sense.
>
> Florian
>
>  [RFC 1/5] vcs-svn: Add sha1 calculaton to fast_export and

This change makes me uncomfortable.
We are doubling up on hashing with fast-import.
This introduces git-specific logic into vcs-svn.

>  [RFC 2/5] svndump: move struct definitions to .h.
>  [RFC 3/5] vcs-svn/svndump: restructure node_ctx, rev_ctx handling
>  [RFC 4/5] vcs-svn/svndump: rewrite handle_node(),
>  [RFC 5/5] vcs-svn: remove repo_tree

I haven't read the rest of the series yet but I expect
it is less controversial than the first patch.

--
David Michael Barr
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html