Re: [PATCH 0/3] fixup remaining cvsimport tests
On 01/20/2013 09:17 PM, Chris Rorvick wrote: I probably won't be sending any more patches on this. My hope was to get cvsimport-3 (w/ cvsps as the engine) in a state such that one could transition from the previous version seamlessly. But the break in t9605 has convinced me this is not worth the effort--even in this trivial case cvsps is broken. The fuzzing logic aggregates commits into patch sets that have timestamps within a specified window and otherwise matching attributes. This aggregation causes file-level commit timestamps to be lost and we are left with a single timestamp for the patch set: the minimum for all contained CVS commits. When all commits have been processed, the patch sets are ordered chronologically and printed. The problem is that is that a CVS commit is rolled into a patch set regardless of whether the patch set's timestamp falls within the adjacent CVS file-level commits. Even worse, since the patch set timestamp changes as subsequent commits are added (i.e., it's always picking the earliest) it is potentially indeterminate at the time a commit is added. The result is that file revisions can be reordered in resulting Git import (see t9605.) I spent some time last week trying to solve this but I coudln't think of anything that wasn't a substantial re-work of the code. I have never used cvs2git, but I suspect Eric's efforts in making it a potential backend for cvsimport are a better use of time. Thanks for your explanation of how cvsps works. This is roughly how cvs2svn used to work years ago, prior to release 2.x. In addition it did a number of things to try to tweak the timestamp ordering to avoid committing file-level commits in the wrong order. It never worked 100%; each tweak that was made to fix one problem created another problem in another scenario. cvs2svn/cvs2git 2.x takes a very different approach. It uses a timestamp threshold along with author and commit-message matching to find the biggest set of file-level commits that might constitute a repository-level commit. But then it checks the proto-commits to see if they violate the ordering constraints imposed by the individual file-level commits. For example, if the initial grouping gives the following proto-commits: proto-commit 1: a.txt 1.1b.txt 1.2 proto-commit 2: a.txt 1.2b.txt 1.1 then it is apparent that something is wrong, because a.txt 1.1 necessarily comes before a.txt 1.2 whereas b.txt 1.1 necessarily comes before b.txt 1.2 (CVS can at least be relied on to get this right!) and therefore there is no consistent ordering of the two proto-commits. More generally, the proto-commits have to form a directed acyclic graph, whereas this graph has a cycle 1 - 2 - 1. When cvs2svn/cvs2git finds a cycle, it uses heuristics to break up one or more of the proto-commits to break the cycle. In this case it might break proto-commit 1 into two commits: proto-commit 1a: a.txt 1.1 proto-commit 2: a.txt 1.2b.txt 1.1 proto-commit 1b: b.txt 1.2 Now it is possible to commit them in the order 1a,2,1b. (Exactly this scenario is tested in t9603.) Of course a typical proto-commit graph often contains far more complicated cycles, but the approach remains the same: split proto-commits up as necessary until the graph is acyclic. One can quibble about the heuristics that cvs2svn/cvs2git uses to break up proto-commits. But the final result of the algorithm is *guaranteed* to be consistent with the file-level CVS history and also self-consistent. I am skeptical that a simpler approach will ever work 100%. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Wed, Jan 23, 2013 at 10:54:36AM +0100, Michael Haggerty wrote: On 01/20/2013 09:17 PM, Chris Rorvick wrote: I have never used cvs2git, but I suspect Eric's efforts in making it a potential backend for cvsimport are a better use of time. Is it possible to perform an incremental import with cvs2git? This seems to be the one use case where the old cvsimport script (with cvsps 2.x) still performs the best. I suppose that just re-running the full import will do the right thing since the commits in Git should be identical, but would it be possible to do better given the right information about a previous run? John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
Hi Chris, On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? I'm hoping to publish some fixes to command handling but would like to have the tests passing first - and if you've already done the work... Sorry if you have and I've missed it, John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote: Hi Chris, On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? It's in Eric's repo and included in version 3.8: https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6 Chris -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote: On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote: On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? It's in Eric's repo and included in version 3.8: https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6 Thanks. For some reason I thought the fix would be to git-cvsimport-3.py. Obviously I should have read more carefully. Sorry for the noise. John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
John Keeping j...@keeping.me.uk writes: On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote: On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote: On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? It's in Eric's repo and included in version 3.8: https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6 Thanks. For some reason I thought the fix would be to git-cvsimport-3.py. Obviously I should have read more carefully. Sorry for the noise. This is not a noise, though. Chris, how would we want to proceed? I'd prefer at some point to see cvsimport-3 to be in sync when the one patched and tested in Eric's repository is proven enough. Will Eric be the gatekeeper, or will you be sending patches this way as well? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Sun, Jan 20, 2013 at 10:57:50AM -0800, Junio C Hamano wrote: John Keeping j...@keeping.me.uk writes: On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote: On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote: On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? It's in Eric's repo and included in version 3.8: https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6 Thanks. For some reason I thought the fix would be to git-cvsimport-3.py. Obviously I should have read more carefully. Sorry for the noise. This is not a noise, though. Chris, how would we want to proceed? I'd prefer at some point to see cvsimport-3 to be in sync when the one patched and tested in Eric's repository is proven enough. Will Eric be the gatekeeper, or will you be sending patches this way as well? In this case the patch was to the C portion of cvsps, not the Python cvs-import, so not relevant for this particular case. I currently have a set of patches on top of jc/cvsimport-upgrade, which is slightly out-of-sync with git-cvsimport.py in Eric's cvsps repository, because I hadn't realised that the latter existed until about an hour ago. I haven't decided yet whether to rebase those onto the git-cvsimport.py in the cvsps repository or send them here to apply on top of jc/cvsimport-upgrade. Given that git-cvsimport is a command which has been around for a while and (although this is a complete re-write) the aim of these changes is to keep it working as the upstream project changes, I have a slight preference for keeping git-cvsimport here and recommending that the copy in the cvsps repository is removed. John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Sun, Jan 20, 2013 at 12:57 PM, Junio C Hamano gits...@pobox.com wrote: John Keeping j...@keeping.me.uk writes: On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote: On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote: On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? It's in Eric's repo and included in version 3.8: https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6 Thanks. For some reason I thought the fix would be to git-cvsimport-3.py. Obviously I should have read more carefully. Sorry for the noise. This is not a noise, though. Chris, how would we want to proceed? I'd prefer at some point to see cvsimport-3 to be in sync when the one patched and tested in Eric's repository is proven enough. Will Eric be the gatekeeper, or will you be sending patches this way as well? I probably won't be sending any more patches on this. My hope was to get cvsimport-3 (w/ cvsps as the engine) in a state such that one could transition from the previous version seamlessly. But the break in t9605 has convinced me this is not worth the effort--even in this trivial case cvsps is broken. The fuzzing logic aggregates commits into patch sets that have timestamps within a specified window and otherwise matching attributes. This aggregation causes file-level commit timestamps to be lost and we are left with a single timestamp for the patch set: the minimum for all contained CVS commits. When all commits have been processed, the patch sets are ordered chronologically and printed. The problem is that is that a CVS commit is rolled into a patch set regardless of whether the patch set's timestamp falls within the adjacent CVS file-level commits. Even worse, since the patch set timestamp changes as subsequent commits are added (i.e., it's always picking the earliest) it is potentially indeterminate at the time a commit is added. The result is that file revisions can be reordered in resulting Git import (see t9605.) I spent some time last week trying to solve this but I coudln't think of anything that wasn't a substantial re-work of the code. I have never used cvs2git, but I suspect Eric's efforts in making it a potential backend for cvsimport are a better use of time. Chris -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Sun, Jan 20, 2013 at 1:24 PM, John Keeping j...@keeping.me.uk wrote: On Sun, Jan 20, 2013 at 10:57:50AM -0800, Junio C Hamano wrote: This is not a noise, though. Chris, how would we want to proceed? I'd prefer at some point to see cvsimport-3 to be in sync when the one patched and tested in Eric's repository is proven enough. Will Eric be the gatekeeper, or will you be sending patches this way as well? In this case the patch was to the C portion of cvsps, not the Python cvs-import, so not relevant for this particular case. Oh, I think I misunderstood the question. The only time I passed a patch specifically for git-cvsimport.py directly to Eric was before the his patch was in Junio's repository. Unless I'm mistaken, only the second patch Eric sent was actually imported. Subsequent to this I would have submitted any patches for git-cvsimport.py directly to the git list. I just didn't have any--cvsps had several problems that needed to be worked out before it made sense to look at the importer. In other words, I don't think Eric should be a gatekeeper of this code. Chris -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
On Sun, Jan 20, 2013 at 2:17 PM, Chris Rorvick ch...@rorvick.com wrote: On Sun, Jan 20, 2013 at 12:57 PM, Junio C Hamano gits...@pobox.com wrote: John Keeping j...@keeping.me.uk writes: On Sun, Jan 20, 2013 at 09:22:03AM -0600, Chris Rorvick wrote: On Sun, Jan 20, 2013 at 6:58 AM, John Keeping j...@keeping.me.uk wrote: On Thu, Jan 10, 2013 at 10:27:16PM -0600, Chris Rorvick wrote: These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) Did you post the fix for the revision map publicly anywhere? It's in Eric's repo and included in version 3.8: https://gitorious.org/cvsps/cvsps/commit/abe81e1775a8959291f629029513d1b7160bbde6 Thanks. For some reason I thought the fix would be to git-cvsimport-3.py. Obviously I should have read more carefully. Sorry for the noise. This is not a noise, though. Chris, how would we want to proceed? I'd prefer at some point to see cvsimport-3 to be in sync when the one patched and tested in Eric's repository is proven enough. Will Eric be the gatekeeper, or will you be sending patches this way as well? I probably won't be sending any more patches on this. My hope was to get cvsimport-3 (w/ cvsps as the engine) in a state such that one could transition from the previous version seamlessly. But the break in t9605 has convinced me this is not worth the effort--even in this trivial case cvsps is broken. The fuzzing logic aggregates commits into patch sets that have timestamps within a specified window and otherwise matching attributes. This aggregation causes file-level commit timestamps to be lost and we are left with a single timestamp for the patch set: the minimum for all contained CVS commits. When all commits have been processed, the patch sets are ordered chronologically and printed. The problem is that is that a CVS commit is rolled into a patch set regardless of whether the patch set's timestamp falls within the adjacent CVS file-level commits. Even worse, since the patch set timestamp changes as subsequent commits are added (i.e., it's always picking the earliest) it is potentially indeterminate at the time a commit is added. The result is that file revisions can be reordered in resulting Git import (see t9605.) I spent some time last week trying to solve this but I coudln't think of anything that wasn't a substantial re-work of the code. I have never used cvs2git, but I suspect Eric's efforts in making it a potential backend for cvsimport are a better use of time. Chris Hi Eric, I noticed you were taken off this thread. As I mention above, I looked into the bug tested in the t9605 patch Junio applied on top of your cvsimport patch. The test was actually written for master to test the Perl/cvsps2 import, but with minor modification you can verify the problem still exists in the 3.x versions of cvsps. I think the email above explains the problem pretty well. It's not clear to me what all the nastiness is that you've resolved with cvsps since taking over; I've been mostly concerned with importing an almost branchless repository which I thought avoided the types of problems you were addressing. But this bug can actually cause Git's main import branch to become inconsistent with CVS HEAD and you don't have to do anything too weird to get hit by it. Fixing this seemed like it would require splitting the processing out into a couple phases and would be a fair amount of work, but maybe I'm just not looking at the problem right. Chris -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fixup remaining cvsimport tests
I probably won't be sending any more patches on this. My hope was to get cvsimport-3 (w/ cvsps as the engine) in a state such that one could transition from the previous version seamlessly. But the break in t9605 has convinced me this is not worth the effort--even in this trivial case cvsps is broken. The fuzzing logic aggregates commits into patch sets that have timestamps within a specified window and otherwise matching attributes. This aggregation causes file-level commit timestamps to be lost and we are left with a single timestamp for the patch set: the minimum for all contained CVS commits. When all commits have been processed, the patch sets are ordered chronologically and printed. The problem is that is that a CVS commit is rolled into a patch set regardless of whether the patch set's timestamp falls within the adjacent CVS file-level commits. Even worse, since the patch set timestamp changes as subsequent commits are added (i.e., it's always picking the earliest) it is potentially indeterminate at the time a commit is added. The result is that file revisions can be reordered in resulting Git import (see t9605.) I spent some time last week trying to solve this but I coudln't think of anything that wasn't a substantial re-work of the code. I've lost who was who in the comment thread, but I think it is rather likely that the above diagnosis is correct in every respect. I won't know for certain until I finish the test suite and apply it to all three tools (cvsps, cvs2git, cvs-fast-export) but what I've seen of their code indicates that cvsps has the weakest changeset analysis of the three, even after my fixes. I have never used cvs2git, but I suspect Eric's efforts in making it a potential backend for cvsimport are a better use of time. Agreed. I didn't add multiengine support to csvsimport at random or just because Heiko Vogt was bugging me about parsecvs. I was half-expecting cvsps to manifest a showstopper like this - hoping it wouldn't, but hedging against the possibility by making alternate engines easy to plug into git-cvsimport seemed like a *really good idea* from the beginning of my work on it. Sometimes being that kind of right really sucks. While I am going to have a try at modifying cvsps to make Chris's t9605 case work, I'm going to strictly limit the amount of time I spend on that effort since (as you imply) it is fairly likely this would be throwing good money after bad. Fixing this seemed like it would require splitting the processing out into a couple phases and would be a fair amount of work, but maybe I'm just not looking at the problem right. Actually I think you've called it *exactly* right. The job has to be done in multiple clique-spitting phases - that's why cvs2git has 7 passes (though a few of those, perhaps as many as 3, are artifactual). This is why the next step in my current work plan for CVS-related stuff will be unbundling my test suite from the cvsps tree and running it to see if cvs-fast-export dominates cvsps. I'm expecting that it will, in which case my plan will be to salvage the CVS client code out of cvsps (*that* part is quite good - fast, clean, effective) gluing it to the better analysis stage in cvs-fast-export, and then shooting cvsps through the head and burying it behind the barn. -- a href=http://www.catb.org/~esr/;Eric S. Raymond/a -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] fixup remaining cvsimport tests
These patchs apply on top of of Eric Raymond's cvsimport patch. 7 of 15 tests in t9600 fail, one of which is fixed w/ a cvsps patch I've sent to Eric (fixes revision map.) It no longer uses origin as the default branch which I suspect is a problem for at least some of the remaining tests. Both of the t9604 tests pass. Chris Chris Rorvick (3): t/lib-cvs.sh: allow cvsps version 3.x. t9600: fixup for new cvsimport t9604: fixup for new cvsimport t/lib-cvs.sh| 2 +- t/t9600-cvsimport.sh| 10 -- t/t9604-cvsimport-timestamps.sh | 5 ++--- 3 files changed, 7 insertions(+), 10 deletions(-) -- 1.8.1.1.g220e17a -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html