AW: reposurgeon now writes Subversion repositories

2012-11-29 Thread Markus Schaber
Hi,

Von: Eric S. Raymond [mailto:e...@thyrsus.com]
> > How does reposurgeon handle empty directories with (node) properties?
> 
> Currently by ignoring all of them except svn:ignore, which it turns
> into .gitignore content on the gitspace side.  And now vice-versa, too.
> 
> Not clear what else it *could* do.  I'd take suggestions.

AFAIR, SvnBridge (which bridges SVN to Team Foundation Server for CodePlex) 
creates a hidden .svnproperties file where all the properties of the directory 
and files are stored.

I'm not really sure, but maybe this could be used as some standard to bridge 
svn properties to non-svn VCSes.

Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com
CODESYS internet forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reposurgeon now writes Subversion repositories

2012-11-29 Thread Eric S. Raymond
Daniel Shahaf :
> I don't see the kludge here --- git has a "author" != "committer"
> distinction, svn doesn't, so if you want to grow that distinction the
> most natural way is a new property.  Storing additional information in
> svn:author is a separate issue.

See my advocacy to Branko of going to Internet-scoped IDs. The kludge
would be maintaining the local and Internet-scoped identifications 
as different properties and having to decide which one to key on
ad-hoc.  Nothing to do with the author/committer distinction. 
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reposurgeon now writes Subversion repositories

2012-11-29 Thread Daniel Shahaf
(note, other half of the thread is on dev@svn only..)

Eric S. Raymond wrote on Thu, Nov 29, 2012 at 06:46:37 -0500:
> Daniel Shahaf :
> > You might also seek community consensus to reserve an svn:foo name for
> > the "original author" property --- perhaps svn:original-author --- so
> > that reposurgeon and other git->svn tools can interoperate in the way
> > they transfer the "original author" information.
> 
> OK.  But I like the idea of letting the users set their own author
> content string better.  Instead of another layer of kluges, why

I don't see the kludge here --- git has a "author" != "committer"
distinction, svn doesn't, so if you want to grow that distinction the
most natural way is a new property.  Storing additional information in
svn:author is a separate issue.

> > >   Subversion has a concept of "flows"; that is, named segments of
> > >   history corresponding to files or directories that are created when
> > >   the path is added, cloned when the path is copied, and deleted when
> > >   the path is deleted. This information is not preserved in import
> > >   streams or the internal representation that reposurgeon uses.  Thus,
> > >   after editing, the flow boundaries of a Subversion history may be
> > >   arbitrarily changed.
> > > 
> > > This is me being obsessive about documenting the details.  I think it
> > > is doubtful that most Subversion users even know flows exist.
> > 
> > I think you're saying that adds might turn into copies, and vice-versa.
> > That is something users would notice --- it is certainly exposed in the
> > UI --- even though node-id's are not exposed to clients.
> 
> I'm saying nobody thinks of flows when they do branch copies.  It's
> not just that users don't see node IDs, it's that no part of most users'
> mental model of how Subversion works resembles them.

I'm still not sure what you have in mind.  I note that 'svn log' and
'svn blame' cross both file copies and branch creation --- that's one
effect of "'svn cp foo bar; svn ci' causes bar to be related to foo".

> -- 
>   http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reposurgeon now writes Subversion repositories

2012-11-29 Thread Eric S. Raymond
Daniel Shahaf :
> > Subversion's metadata doesn't have separate author and committer
> > properties, and doesn't store anything but a Unix user ID as
> > attribution.  I don't see any way around this.
> 
> You're not fully informed, then.
> 
> 1) svn:author revprops can contain any UTF-8 string.  They are not
> restricted to Unix user id's.  (For example, they can contain full
> names, if the administrator so chooses.)

Right.  At one point during the development of this feature I was
accidentally storing the full email field in this property.  So I
already knew that this is allowed at some level.  

And, I have no trouble believing that svn log will cheerfully echo
anything that I choose to stuff in that field.  

But...

(1) How much work would it be it to set up a Subversion installation 
so that when I svn commit, the tool does the right thing, e.g. puts
a DVCS-style fullname/email string in there?  

(2) Have the tools been tested for bugs arising from having whitespace
in that data?

Really, if it's actually easy to set up DVCS-style globally unique IDs you
Subversion guys ought to be shouting it from the housetops.  The absence
of this capability is a serious PITA in several situations, including 
for example migrating projects between forges.

RFC: If I wrote a patch that let Subversion users set their own
content string for the author field in ~/.subversion/config, would
you merge it?  Because I'd totally write that.

> 2) You can define custom revision properties.  In your case, the easiest
> way would be to set an reposurgeon:author property, alongside the
> svn:author property.

Yeah, sure, I've assumed all along this wouldn't break if I tried it.
If I actually thought you guys were capable of designing a data model
with a perfectly general-looking store of key/value pairs and then
arbitrarily restricting the key set so I couldn't do that, I'd almost
have to find each and every one of you and kick your asses into next
Tuesday on account of blatant stupidity. I have no such plans :-).

But...what good does this capability do?  OK, it would assist
round-tripping back to gitspace, but while that's kind of cool I don't
see any help for a normal Subversion workflow here.
 
> You might also seek community consensus to reserve an svn:foo name for
> the "original author" property --- perhaps svn:original-author --- so
> that reposurgeon and other git->svn tools can interoperate in the way
> they transfer the "original author" information.

OK.  But I like the idea of letting the users set their own author
content string better.  Instead of another layer of kluges, why
shouldn't Subversion join the DVCSes in the happy land of
Internet-scoped attributions?

> How does reposurgeon handle empty directories with (node) properties?

Currently by ignoring all of them except svn:ignore, which it turns 
into .gitignore content on the gitspace side.  And now vice-versa, too.

Not clear what else it *could* do.  I'd take suggestions.

> >   Subversion has a concept of "flows"; that is, named segments of
> >   history corresponding to files or directories that are created when
> >   the path is added, cloned when the path is copied, and deleted when
> >   the path is deleted. This information is not preserved in import
> >   streams or the internal representation that reposurgeon uses.  Thus,
> >   after editing, the flow boundaries of a Subversion history may be
> >   arbitrarily changed.
> > 
> > This is me being obsessive about documenting the details.  I think it
> > is doubtful that most Subversion users even know flows exist.
> 
> I think you're saying that adds might turn into copies, and vice-versa.
> That is something users would notice --- it is certainly exposed in the
> UI --- even though node-id's are not exposed to clients.

I'm saying nobody thinks of flows when they do branch copies.  It's
not just that users don't see node IDs, it's that no part of most users'
mental model of how Subversion works resembles them.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reposurgeon now writes Subversion repositories

2012-11-29 Thread Branko Čibej
On 29.11.2012 08:58, Daniel Shahaf wrote:
> I think you're saying that adds might turn into copies, and
> vice-versa. That is something users would notice --- it is certainly
> exposed in the UI --- even though node-id's are not exposed to clients. 

... yet. But there are plans underway to expose them.

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reposurgeon now writes Subversion repositories

2012-11-29 Thread Daniel Shahaf
Eric S. Raymond wrote on Thu, Nov 29, 2012 at 00:59:45 -0500:
>   In summary, Subversion repository histories do not round-trip through
>   reposurgeon editing. File content changes are preserved but some
>   metadata is unavoidably lost.  Furthermore, writing out a DVCS history
>   in Subversion also loses significant portions of its metadata.
> 
>   Writing a Subversion repository or dump stream discards author
>   information, the committer's name, and the hostname part of the commit
>   address; only the commit timestamp and the local part of the
>   committer's email address are preserved, the latter becoming the
>   Subversion author field.  However, reading a Subversion repository and
>   writing it out again will preserve the author fields.
> 
> Subversion's metadata doesn't have separate author and committer
> properties, and doesn't store anything but a Unix user ID as
> attribution.  I don't see any way around this.

You're not fully informed, then.

1) svn:author revprops can contain any UTF-8 string.  They are not
restricted to Unix user id's.  (For example, they can contain full
names, if the administrator so chooses.)

2) You can define custom revision properties.  In your case, the easiest
way would be to set an reposurgeon:author property, alongside the
svn:author property.

You might also seek community consensus to reserve an svn:foo name for
the "original author" property --- perhaps svn:original-author --- so
that reposurgeon and other git->svn tools can interoperate in the way
they transfer the "original author" information.

I note that one can set revision properties at commit time:

svn commit -m logmsg --with-revprop svn:original-author="Patch Submitter 
"

>   Empty directories aren't represented in import streams. Consequently,
>   reading and writing Subversion repositories preserves file content,
>   but not empty directories.  It is also not guaranteed that after
>   editing a Subverson repository that the sequence of directory
>   creations and deletions relative to other operations will be
>   identical; the only guarantee is that enclosing directories will be
>   created before any files in them are.

How does reposurgeon handle empty directories with (node) properties?

% svnadmin create r
% svnmucc -mm -U file://$PWD/r mkdir foo propset k v foo

>   Subversion has a concept of "flows"; that is, named segments of
>   history corresponding to files or directories that are created when
>   the path is added, cloned when the path is copied, and deleted when
>   the path is deleted. This information is not preserved in import
>   streams or the internal representation that reposurgeon uses.  Thus,
>   after editing, the flow boundaries of a Subversion history may be
>   arbitrarily changed.
> 
> This is me being obsessive about documenting the details.  I think it
> is doubtful that most Subversion users even know flows exist.
> 

I think you're saying that adds might turn into copies, and vice-versa.
That is something users would notice --- it is certainly exposed in the
UI --- even though node-id's are not exposed to clients.

> 

Cheers

Daniel
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


reposurgeon now writes Subversion repositories

2012-11-28 Thread Eric S. Raymond
This is something that probably doesn't happen very often -
cross-posting to the Subversion and git dev lists that is on-topic for
both :-).

The repo head version of reposurgeon can now write Subversion
repositories from its common git-import-stream-based representation of
repository histories, as well as reading them in.  This joins full
support for git, hg, and bzr; it means that in theory reposurgeon
could now be used to move revision histories from these systems to
Subversion, as well as the other way around.

(For those of you who have been living under a rock, reposurgeon is a
multi-VCS surgery and conversion tool. Since 2.x it does a more
intelligent job of lifting from Subversion to anything else than any
other tool I know of. Much more at .)

Presently, writing (as opposed to reading) Subversion repos is more of
a stunt than a real production technique, and may always remain so.
It has serious limitations.  I am posting because I think the details
of those limitations will be of some technical interest to both
Subversion and git developers.

Indented paragraphs is the documentation from reposurgeon's manual
page.  I have added some further notes.

  In summary, Subversion repository histories do not round-trip through
  reposurgeon editing. File content changes are preserved but some
  metadata is unavoidably lost.  Furthermore, writing out a DVCS history
  in Subversion also loses significant portions of its metadata.

  Writing a Subversion repository or dump stream discards author
  information, the committer's name, and the hostname part of the commit
  address; only the commit timestamp and the local part of the
  committer's email address are preserved, the latter becoming the
  Subversion author field.  However, reading a Subversion repository and
  writing it out again will preserve the author fields.

Subversion's metadata doesn't have separate author and committer
properties, and doesn't store anything but a Unix user ID as
attribution.  I don't see any way around this.

  Import-stream timestamps have 1-second granularity. The subsecond
  parts of Subversion commit timestamps will be lost on their way through
  reposurgeon.

Unavoidable in moving from Subversion to git import streams, and one
of two places where git's data model requires us to throw away
information.  

However, I think I could preserve this information in a
Subversion-to-Subversion editing scenario by storing the incoming
timestamps as floats and only truncating them on import-stream output,
leaving the subseconds in place for Subversion output.

  Empty directories aren't represented in import streams. Consequently,
  reading and writing Subversion repositories preserves file content,
  but not empty directories.  It is also not guaranteed that after
  editing a Subverson repository that the sequence of directory
  creations and deletions relative to other operations will be
  identical; the only guarantee is that enclosing directories will be
  created before any files in them are.

  When reading a Subversion repository, reposurgeon discards the special
  directory-copy nodes associated with branch creations.  These can't be
  recreated if and when the repository is written back out to
  Subversion; rather, each branch copy node from the original translates
  into a branch creation plus the first set of file modifications on the
  branch.

In theory, I could relax the rules of reposurgeon's internal
representation so that empty directory-creation and deletion nodes are
not discarded at read time but only when outputting a git event stream.

That would bring Subversion repositories closer to round-tripping, but
not get all the way there.  One problem is botched branch copies -
directory copies with cp(1) followed by Subversion add operations.
This is not an uncommon malformation; reposurgeon takes it in stride,
treating these as though they had been real branch copies and
simplifying the backlinks appropriately.

  When reading a Subversion repository, reposurgeon also automatically
  breaks apart mixed-branch commits.

It has to.  These just can't be represented in the import-stream model of
branching.

  Because of the preceding two points, it is not guaranteed that 
  even revision numbers will be stable when a Subversion repository
  is read in and then written out!

So not only can Subversion repos fail to round-trip exactly, in the
presence of lots of branch copies and mixed-branch commits the
relationship between the read-in and written out revision numbers
could get pretty unpredictable.

  Subversion repositories are always written with a standard
  (trunk/tags/branches) layout. Thus, a repository with a nonstandard
  shape that has been analyzed by reposurgeon won't be written out with
  the same shape.

In particular, this means linear Subversion repositories with no trunk
(an organization some smaller projects used to use and might still)
will turn into branchy repos