Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-23 Thread Michael Haggerty
On 01/20/2013 09:17 PM, Chris Rorvick wrote:
 I probably won't be sending any more patches on this.  My hope was to
 get cvsimport-3 (w/ cvsps as the engine) in a state such that one
 could transition from the previous version seamlessly.  But the break
 in t9605 has convinced me this is not worth the effort--even in this
 trivial case cvsps is broken.  The fuzzing logic aggregates commits
 into patch sets that have timestamps within a specified window and
 otherwise matching attributes.  This aggregation causes file-level
 commit timestamps to be lost and we are left with a single timestamp
 for the patch set: the minimum for all contained CVS commits.  When
 all commits have been processed, the patch sets are ordered
 chronologically and printed.
 
 The problem is that is that a CVS commit is rolled into a patch set
 regardless of whether the patch set's timestamp falls within the
 adjacent CVS file-level commits.  Even worse, since the patch set
 timestamp changes as subsequent commits are added (i.e., it's always
 picking the earliest) it is potentially indeterminate at the time a
 commit is added.  The result is that file revisions can be reordered
 in resulting Git import (see t9605.)  I spent some time last week
 trying to solve this but I coudln't think of anything that wasn't a
 substantial re-work of the code.
 
 I have never used cvs2git, but I suspect Eric's efforts in making it a
 potential backend for cvsimport are a better use of time.

Thanks for your explanation of how cvsps works.

This is roughly how cvs2svn used to work years ago, prior to release
2.x.  In addition it did a number of things to try to tweak the
timestamp ordering to avoid committing file-level commits in the wrong
order.  It never worked 100%; each tweak that was made to fix one
problem created another problem in another scenario.

cvs2svn/cvs2git 2.x takes a very different approach.  It uses a
timestamp threshold along with author and commit-message matching to
find the biggest set of file-level commits that might constitute a
repository-level commit.  But then it checks the proto-commits to see if
they violate the ordering constraints imposed by the individual
file-level commits.  For example, if the initial grouping gives the
following proto-commits:

proto-commit 1: a.txt 1.1b.txt 1.2

proto-commit 2: a.txt 1.2b.txt 1.1

then it is apparent that something is wrong, because a.txt 1.1
necessarily comes before a.txt 1.2 whereas b.txt 1.1 necessarily comes
before b.txt 1.2 (CVS can at least be relied on to get this right!) and
therefore there is no consistent ordering of the two proto-commits.
More generally, the proto-commits have to form a directed acyclic graph,
whereas this graph has a cycle 1 - 2 - 1.  When cvs2svn/cvs2git finds
a cycle, it uses heuristics to break up one or more of the proto-commits
to break the cycle.  In this case it might break proto-commit 1 into two
commits:

proto-commit 1a: a.txt 1.1

proto-commit 2:  a.txt 1.2b.txt 1.1

proto-commit 1b:  b.txt 1.2

Now it is possible to commit them in the order 1a,2,1b.  (Exactly this
scenario is tested in t9603.)

Of course a typical proto-commit graph often contains far more
complicated cycles, but the approach remains the same: split
proto-commits up as necessary until the graph is acyclic.  One can
quibble about the heuristics that cvs2svn/cvs2git uses to break up
proto-commits.  But the final result of the algorithm is *guaranteed* to
be consistent with the file-level CVS history and also self-consistent.

I am skeptical that a simpler approach will ever work 100%.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-23 Thread John Keeping
On Wed, Jan 23, 2013 at 10:54:36AM +0100, Michael Haggerty wrote:
 On 01/20/2013 09:17 PM, Chris Rorvick wrote:
 I have never used cvs2git, but I suspect Eric's efforts in making it a
 potential backend for cvsimport are a better use of time.

Is it possible to perform an incremental import with cvs2git?  This
seems to be the one use case where the old cvsimport script (with cvsps
2.x) still performs the best.

I suppose that just re-running the full import will do the right thing
since the commits in Git should be identical, but would it be possible
to do better given the right information about a previous run?


John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread John Keeping
Hi Chris,

On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

Did you post the fix for the revision map publicly anywhere?  I'm hoping
to publish some fixes to command handling but would like to have the
tests passing first - and if you've already done the work...

Sorry if you have and I've missed it,

John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Chris Rorvick
On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote:
 Hi Chris,

 On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

 Did you post the fix for the revision map publicly anywhere?

It's in Eric's repo and included in version 3.8:

https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6

Chris
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread John Keeping
On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote:
 On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote:
 On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

 Did you post the fix for the revision map publicly anywhere?
 
 It's in Eric's repo and included in version 3.8:
 
 https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6

Thanks.  For some reason I thought the fix would be to
git-cvsimport-3.py.  Obviously I should have read more carefully.

Sorry for the noise.


John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Junio C Hamano
John Keeping j...@keeping.me.uk writes:

 On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote:
 On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote:
 On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

 Did you post the fix for the revision map publicly anywhere?
 
 It's in Eric's repo and included in version 3.8:
 
 https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6

 Thanks.  For some reason I thought the fix would be to
 git-cvsimport-3.py.  Obviously I should have read more carefully.

 Sorry for the noise.

This is not a noise, though.

Chris, how would we want to proceed?  I'd prefer at some point to
see cvsimport-3 to be in sync when the one patched and tested in
Eric's repository is proven enough.  Will Eric be the gatekeeper, or
will you be sending patches this way as well?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread John Keeping
On Sun, Jan 20, 2013 at 10:57:50AM -0800, Junio C Hamano wrote:
 John Keeping j...@keeping.me.uk writes:
 On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote:
 On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote:
 On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

 Did you post the fix for the revision map publicly anywhere?
 
 It's in Eric's repo and included in version 3.8:
 
 https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6

 Thanks.  For some reason I thought the fix would be to
 git-cvsimport-3.py.  Obviously I should have read more carefully.

 Sorry for the noise.
 
 This is not a noise, though.
 
 Chris, how would we want to proceed?  I'd prefer at some point to
 see cvsimport-3 to be in sync when the one patched and tested in
 Eric's repository is proven enough.  Will Eric be the gatekeeper, or
 will you be sending patches this way as well?

In this case the patch was to the C portion of cvsps, not the Python
cvs-import, so not relevant for this particular case.

I currently have a set of patches on top of jc/cvsimport-upgrade, which
is slightly out-of-sync with git-cvsimport.py in Eric's cvsps
repository, because I hadn't realised that the latter existed until
about an hour ago.

I haven't decided yet whether to rebase those onto the git-cvsimport.py
in the cvsps repository or send them here to apply on top of
jc/cvsimport-upgrade.  Given that git-cvsimport is a command which has
been around for a while and (although this is a complete re-write) the
aim of these changes is to keep it working as the upstream project
changes, I have a slight preference for keeping git-cvsimport here and
recommending that the copy in the cvsps repository is removed.


John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Chris Rorvick
On Sun, Jan 20, 2013 at 12:57 PM, Junio C Hamano gits...@pobox.com wrote:
 John Keeping j...@keeping.me.uk writes:

 On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote:
 On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote:
 On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

 Did you post the fix for the revision map publicly anywhere?

 It's in Eric's repo and included in version 3.8:

 https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6

 Thanks.  For some reason I thought the fix would be to
 git-cvsimport-3.py.  Obviously I should have read more carefully.

 Sorry for the noise.

 This is not a noise, though.

 Chris, how would we want to proceed?  I'd prefer at some point to
 see cvsimport-3 to be in sync when the one patched and tested in
 Eric's repository is proven enough.  Will Eric be the gatekeeper, or
 will you be sending patches this way as well?

I probably won't be sending any more patches on this.  My hope was to
get cvsimport-3 (w/ cvsps as the engine) in a state such that one
could transition from the previous version seamlessly.  But the break
in t9605 has convinced me this is not worth the effort--even in this
trivial case cvsps is broken.  The fuzzing logic aggregates commits
into patch sets that have timestamps within a specified window and
otherwise matching attributes.  This aggregation causes file-level
commit timestamps to be lost and we are left with a single timestamp
for the patch set: the minimum for all contained CVS commits.  When
all commits have been processed, the patch sets are ordered
chronologically and printed.

The problem is that is that a CVS commit is rolled into a patch set
regardless of whether the patch set's timestamp falls within the
adjacent CVS file-level commits.  Even worse, since the patch set
timestamp changes as subsequent commits are added (i.e., it's always
picking the earliest) it is potentially indeterminate at the time a
commit is added.  The result is that file revisions can be reordered
in resulting Git import (see t9605.)  I spent some time last week
trying to solve this but I coudln't think of anything that wasn't a
substantial re-work of the code.

I have never used cvs2git, but I suspect Eric's efforts in making it a
potential backend for cvsimport are a better use of time.

Chris
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Chris Rorvick
On Sun, Jan 20, 2013 at 1:24 PM, John Keeping j...@keeping.me.uk wrote:
 On Sun, Jan 20, 2013 at 10:57:50AM -0800, Junio C Hamano wrote:
 This is not a noise, though.

 Chris, how would we want to proceed?  I'd prefer at some point to
 see cvsimport-3 to be in sync when the one patched and tested in
 Eric's repository is proven enough.  Will Eric be the gatekeeper, or
 will you be sending patches this way as well?

 In this case the patch was to the C portion of cvsps, not the Python
 cvs-import, so not relevant for this particular case.

Oh, I think I misunderstood the question.  The only time I passed a
patch specifically for git-cvsimport.py directly to Eric was before
the his patch was in Junio's repository.  Unless I'm mistaken, only
the second patch Eric sent was actually imported.  Subsequent to this
I would have submitted any patches for git-cvsimport.py directly to
the git list.  I just didn't have any--cvsps had several problems that
needed to be worked out before it made sense to look at the importer.

In other words, I don't think Eric should be a gatekeeper of this code.

Chris
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Chris Rorvick
On Sun, Jan 20, 2013 at 2:17 PM, Chris Rorvick ch...@rorvick.com wrote:
 On Sun, Jan 20, 2013 at 12:57 PM, Junio C Hamano gits...@pobox.com wrote:
 John Keeping j...@keeping.me.uk writes:

 On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote:
 On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote:
 On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote:
 These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
 to Eric (fixes revision map.)

 Did you post the fix for the revision map publicly anywhere?

 It's in Eric's repo and included in version 3.8:

 https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6

 Thanks.  For some reason I thought the fix would be to
 git-cvsimport-3.py.  Obviously I should have read more carefully.

 Sorry for the noise.

 This is not a noise, though.

 Chris, how would we want to proceed?  I'd prefer at some point to
 see cvsimport-3 to be in sync when the one patched and tested in
 Eric's repository is proven enough.  Will Eric be the gatekeeper, or
 will you be sending patches this way as well?

 I probably won't be sending any more patches on this.  My hope was to
 get cvsimport-3 (w/ cvsps as the engine) in a state such that one
 could transition from the previous version seamlessly.  But the break
 in t9605 has convinced me this is not worth the effort--even in this
 trivial case cvsps is broken.  The fuzzing logic aggregates commits
 into patch sets that have timestamps within a specified window and
 otherwise matching attributes.  This aggregation causes file-level
 commit timestamps to be lost and we are left with a single timestamp
 for the patch set: the minimum for all contained CVS commits.  When
 all commits have been processed, the patch sets are ordered
 chronologically and printed.

 The problem is that is that a CVS commit is rolled into a patch set
 regardless of whether the patch set's timestamp falls within the
 adjacent CVS file-level commits.  Even worse, since the patch set
 timestamp changes as subsequent commits are added (i.e., it's always
 picking the earliest) it is potentially indeterminate at the time a
 commit is added.  The result is that file revisions can be reordered
 in resulting Git import (see t9605.)  I spent some time last week
 trying to solve this but I coudln't think of anything that wasn't a
 substantial re-work of the code.

 I have never used cvs2git, but I suspect Eric's efforts in making it a
 potential backend for cvsimport are a better use of time.

 Chris

Hi Eric,

I noticed you were taken off this thread.  As I mention above, I
looked into the bug tested in the t9605 patch Junio applied on top of
your cvsimport patch.  The test was actually written for master to
test the Perl/cvsps2 import, but with minor modification you can
verify the problem still exists in the 3.x versions of cvsps.

I think the email above explains the problem pretty well.  It's not
clear to me what all the nastiness is that you've resolved with cvsps
since taking over; I've been mostly concerned with importing an almost
branchless repository which I thought avoided the types of problems
you were addressing.  But this bug can actually cause Git's main
import branch to become inconsistent with CVS HEAD and you don't have
to do anything too weird to get hit by it.

Fixing this seemed like it would require splitting the processing out
into a couple phases and would be a fair amount of work, but maybe I'm
just not looking at the problem right.

Chris
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Eric S. Raymond
  I probably won't be sending any more patches on this.  My hope was to
  get cvsimport-3 (w/ cvsps as the engine) in a state such that one
  could transition from the previous version seamlessly.  But the break
  in t9605 has convinced me this is not worth the effort--even in this
  trivial case cvsps is broken.  The fuzzing logic aggregates commits
  into patch sets that have timestamps within a specified window and
  otherwise matching attributes.  This aggregation causes file-level
  commit timestamps to be lost and we are left with a single timestamp
  for the patch set: the minimum for all contained CVS commits.  When
  all commits have been processed, the patch sets are ordered
  chronologically and printed.
 
  The problem is that is that a CVS commit is rolled into a patch set
  regardless of whether the patch set's timestamp falls within the
  adjacent CVS file-level commits.  Even worse, since the patch set
  timestamp changes as subsequent commits are added (i.e., it's always
  picking the earliest) it is potentially indeterminate at the time a
  commit is added.  The result is that file revisions can be reordered
  in resulting Git import (see t9605.)  I spent some time last week
  trying to solve this but I coudln't think of anything that wasn't a
  substantial re-work of the code.

I've lost who was who in the comment thread, but I think it is rather likely
that the above diagnosis is correct in every respect.

I won't know for certain until I finish the test suite and apply it to
all three tools (cvsps, cvs2git, cvs-fast-export) but what I've seen
of their code indicates that cvsps has the weakest changeset analysis of
the three, even after my fixes.

  I have never used cvs2git, but I suspect Eric's efforts in making it a
  potential backend for cvsimport are a better use of time.

Agreed.  I didn't add multiengine support to csvsimport at random or
just because Heiko Vogt was bugging me about parsecvs.  I was
half-expecting cvsps to manifest a showstopper like this - hoping it
wouldn't, but hedging against the possibility by making alternate
engines easy to plug into git-cvsimport seemed like a *really good
idea* from the beginning of my work on it.  Sometimes being that kind
of right really sucks.

While I am going to have a try at modifying cvsps to make Chris's
t9605 case work, I'm going to strictly limit the amount of time I
spend on that effort since (as you imply) it is fairly likely this
would be throwing good money after bad.

 Fixing this seemed like it would require splitting the processing out
 into a couple phases and would be a fair amount of work, but maybe I'm
 just not looking at the problem right.

Actually I think you've called it *exactly* right.  The job has to be 
done in multiple clique-spitting phases - that's why cvs2git has 7 passes
(though a few of those, perhaps as many as 3, are artifactual).

This is why the next step in my current work plan for CVS-related stuff will
be unbundling my test suite from the cvsps tree and running it to see if
cvs-fast-export dominates cvsps.  

I'm expecting that it will, in which case my plan will be to salvage
the CVS client code out of cvsps (*that* part is quite good - fast,
clean, effective) gluing it to the better analysis stage in
cvs-fast-export, and then shooting cvsps through the head and burying
it behind the barn.
-- 
a href=http://www.catb.org/~esr/;Eric S. Raymond/a
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] fixup remaining cvsimport tests

2013-01-10 Thread Chris Rorvick
These patchs apply on top of of Eric Raymond's cvsimport patch.  7 of 15
tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent
to Eric (fixes revision map.)  It no longer uses origin as the default
branch which I suspect is a problem for at least some of the remaining
tests.  Both of the t9604 tests pass.

Chris

Chris Rorvick (3):
  t/lib-cvs.sh: allow cvsps version 3.x.
  t9600: fixup for new cvsimport
  t9604: fixup for new cvsimport

 t/lib-cvs.sh|  2 +-
 t/t9600-cvsimport.sh| 10 --
 t/t9604-cvsimport-timestamps.sh |  5 ++---
 3 files changed, 7 insertions(+), 10 deletions(-)

-- 
1.8.1.1.g220e17a

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html