Am 13.04.2018 um 10:19 schrieb Stephen J. Turnbull: > Benjamin Franksen writes: > > On 04/10/2018 08:34 AM, Stephen J. Turnbull wrote: > > > > Any user who understands what a ref is will say "a Darcs tag is > > > too a ref!" I think. > > > > Perhaps (but you won't, right?). > > I would, in the sense that it is a name that allows you to rebuild a > version exactly, just as a git tag or branch does. It's not a ref > into a DAG, of course.
That's what I meant. > > > How do you identify "official"? > > > > I can't, unless it's an "official" repo to start with (e.g. > > http://darcs.net/) and then I would assume that all branches are > > "official" (assuming darcs had branches). > > This is generally not true with git. In corporate situations, > including large volunteer projects like Python or GHC, it probably is > true. But in cases of smaller projects, or even projects with formal > organizations that translated repos from centralized systems where > public branches were an important form of communication, I would > expect a lot of detritus. But in principle it seems we agree that branches in a public, shared repo should not be used, nowadays, to publish e.g. some experimental development. There should be a better (clearer, more explicit) way to communicate/publish work that deviates from the shared baseline(s) of "officially accepted" branches in a project. The "fork" feature of github is indeed a pretty good solution, except that it is tied to one central service. I think I need not re-iterate here the reasons why depending on a single service is problematic, particularly if running such a service is subject to commercial interests. So I would like to have something more "distributed" in nature, similar to peer-to-peer file sharing, with one or more competing services that merely act as a directory for searching and discovering related repos (and, perhaps, communication i.e. pull requests), and other services where repos and/or bug trackers are hosted. For a smooth user experience this would require some common protocol for all this higher level information. Perhaps one day someone develops something like that. > Also, many projects make official "release branches". Python has > several score by now. In the Mercurial days, each was a separate > repo, but in git there's been substantial merging. I'm not sure if > they've *all* been aggregated into one repo, but the backports policy > suggests they might have, for convenience in cherry-picking. Release branches are the prototype of what I meant with "official branches". > > > I doubt they'd be willing to make "export all branches on clone" > > > a default, and it's not clear to me that the "I just want to see > > > the mainline" aren't the majority. > > > > How do they identify "the mainline"? > > To the folks who just want the VCS to stay out of their way, it's > "whatever $VCS clone scheme://project.org/official checks out." This would not work for at least one project I am maintaining. I have several equally usable and maintained branches (versions). It is true that serious development happens only on the latest version, but due to its instability potential contributors would most probably want to target a more stable release because this is what they use. It depends on what kind of contributions are expected: occasional bug fixes and minor improvements, or substantial contributions for new features. So I maintain that, at least in /my/ experience, there are cases where this "one default mainline branch" is not appropriate. > You mentioned "familiar and comfortable with Darcs". I don't think > "comfortable" implies "familiar" (in the sense of how the internals > work and how it differs from other VCSes the user may be comfortable > with). I meant "familiar" not with how the internals work but with the UI and the user level idioms, i.e. the ability to use it effectively and with confidence (regarding the outcome of commands issued). (An exception for Darcs is the patch matching options which are messy and interact in strange and sometimes unpredictable ways. Cleaning that part up has been on my TODO list for a long time.) > I think it means (to most users) that the VCS stays out of their way. I can't speak for "most users" but for me it is a way to structure my work and thus I expect more than that it stays out of my way. There is no way around the fact that VC has a great influence on the way you work and how you share and co-operate with others, very much like the choice of programming language does. I have long since decided to embrace it and use it to improve what I am doing and the way I do it. I am using Darcs even for activities that have nothing to do with programming (e.g. I've written a short story once and as a matter of course I kept the text under Darcs control). > > Indeed. At work we use Darcs for development of several medium sized > > control systems for scientific instruments. > > Interesting to see that description. Sounds like what I would expect > (notwithstanding the unfortunate experiment with git submodules, that > kind of thing happens in the best-run organizations). This was a different project (EPICS) that we use but of which we are not the main developers. It is another good example for maintaining a number of long running branches (there is 3.14, 3.15, 3.16, and now 7), all of which may and do receive contributions, which, if appropriate, are forward or backward ported by the maintainers. This is not luxury but necessity: most users are pretty constrained w.r.t. manpower after the initial few years of building a facility. Switching to a new release is usually done only very carefully, step by step, if at all, because we have an experiment or machine to run 24/7, downtimes are scarce, and a lot of the equipment are prototypes that exist only this once, so you have to test and debug on the live machine (there is no other). This means users are normally interested in a particular branch and will target mostly that one branch when contributing. The maintainer's job is greatly simplified by launchpad's pretty advanced bug tracker that allows to track progress of a single ticket along several branches in parallel. > > > This means that from an > > > individual developer's point of view, the state of master is a triple: > > > (1) what's actually in the official repo (unknown; another dev may > > > have updated), > > > > True. (Though it is easy to check if this is the case (hg incoming, > > darcs pull --dry-run, git <whatever>)). > > Your network is not run by the "MIT of Japan" (my employer, where the > abbreviation *really* expands to "minimally informed technicians"), > nor is it inside the Great Firewall (currently GitHub is blocked in > China, I am informed). And it mostly matters in the last five minutes > before a feature freeze. ;-) You are right that comparing with the remote directly will fail if there is no connectivity. A problem if you are traveling a lot and work from planes etc, not something I am doing a lot, so I didn't think of that. > > > I don't think this comparison is entirely accurate. All DAG-based > > > systems permit cherry-picking and rebasing, although those like > > > Mercurial and Bazaar do try to deprecate rebase. In git they are > > > first-class operations. > > > > Cherry-picking is an attempt to get the effect of patch commutation > > without paying the price. You get what you pay for: an ad-hoc solution > > that may or may not give you the results you expect. > > You are willing to say that in public after denying that Darcs has, or > you even want, a semantic patch theory? ;-) Oh, the idea of a semantic patch theory is certainly tempting but I think it is not realistic, today. Even if you succeed in finding the perfect formalism for a particular language, what about all the other languages that are used in a project, not to mention documentation in various formats, the build system, etc etc. And then think about languages evolving, with various versions and/or option-enabled features. > > Making sure the results are what you expect is tedious and error > > prone and I understand if people are nervous about it ("untested > > versions, gaah!"). > > Every Darcs repo implies a number of untested versions which is > potentially exponential in the number of patches. I have no idea in > practice how many versions are typically generated by repeated > obliteration respecting dependencies, but I imagine it's way larger > than the number of versions actually subjected to formal testing. (I > would guess properly tested versions are approximately linear in the > number of patches). You are absolutely right about that. There may be intermediate states that have never been tested, may not even compile, but so what? Software has bugs, test suites have holes, and implementing new features can be tricky and cause regressions. So you fix the bugs and add more regression tests. This is the same everywhere[1] regardless of VCS, and just because your VCS does not allow you to (easily, transparently) re-order changes and thus makes it more difficult to compose these other intermediate states (create a new branch that forks off in the past and so on) does not mean it is "better tested". If anything, I would claim the opposite: the fact that developers tend to work on slightly different histories actually improves test coverage in that more intermediate states occur and thus more combinations are tested, reducing the likelyhood of subtle interactions between different parts of a program to cause bugs that are discovered only much later. (I have to admit that I can't provide evidence for this, not even anecdotal ones.) It would be possible, theoretically at least, to design a special way to do integration tests with Darcs managed projects by systematically generating and testing all possible intermediate states. In principle, this /could/ uncover "sleeping bugs", but I guess what it would uncover mostly is missing explicit ("semantic") dependencies that developers forgot (or couldn't be bothered) to add. Feasibility is another matter, though; as you noted this will probably run into exponential blow-up. [1] Even if you have a body of formally specified and verified code, there may be bugs in the specification, and new features mean you have to make changes to the spec etc. [re-ordered from below:] >>> [In scaling Darcs, s]torage blows up, the naming conflicts will >>> be frequent unless you're willing to endure network outages and >>> delays, and URLs for personal repos are often long and/or >>> unintuitive. >> >> Yes, storage blow-up is a problem, and another one is >> discoverability, which is why I want to add branches to Darcs. >> >> I don't understand what you mean with "naming conflicts will be >> frequent". > > Names like "test", "new", and in some cases feature names are likely > to be used independently across personal repos. Oh yes, I would expect lots of those in practice (I am always at a loss how to name my Darcs repos/branches and often resort to silly names like that). One solution could be to mark branches meant for public consumption explicitly and treat all others as private. > > > does patch algebra allow you to avoid some conflicts that would > > > occur in a DAG-based system? > > > > Some, yes. It depends a lot on the foundation i.e. the concrete > > implementation of your patch algebra. It also depends on how conflicts > > are detected in the DAG based system. > > I don't know of any DAG-based systems with a substantial advance over > patch(1). Okay, noted. > > But that is not the main point. The main point is that the patch > > algebra frees you from having to worry about history, /except/ when > > it is relevant, i.e. when patches have dependencies. > > But isn't that costly when you are trying to localize a bug by testing > which versions exhibit it? When bisection works in a DAG-based > system, you have a logarithmic upper bound on search time. (Also when > it fails, you find out in logarithmic time.) It's not obvious to me > that you get that result in Darcs since its "mainline" is > fundamentally nonlinear. As I noted at the beginning of our exchange, the repo contains the patches in a specific order, and this is the order that 'darcs test' uses. So for this we temporarily forget about possible re-orderings. This works because you are normally not interested in the exact set of versions that fail; instead what matters is the (one) change that marks the place where behavior goes from good to bad because it is likely this change where the error was made. > > > If not, what is the great advantage of patch algebra from your > > > point of view? Is it related to the ability to claim the same > > > branch identity for two workspaces that "haven't diverged too > > > much", where a git rebase in a published branch all too often > > > results in an unusable mess of conflicts? > > > > Well, my experience tells me that "an unusable mess of conflicts" can > > happen with Darcs in just the same way. > > I don't think it's "just the same way". My point is that a rebase > changes the "identity" of a branch in a nonlinear way because it's > version-based. In Darcs (at least in theory) you can walk forward > applying the patches and fixing conflicts one patch at a time. (I > guess this is exactly what "darcs rebase" implements.) True, in Darcs > a megapatch can do you in, but *every* git rebase is a megapatch! Hm, is that necessarily so? I would have thought this is only the case if you 'squash' all the intermediate commits to one; AFAIR git rebase -i allows you to proceed more subtly. But perhaps you refer to a certain way to use rebase, as the Linux kernel devs seem to prefer, where new features are always squashed into a single big commit? > > When i pull a patch from your repo and it doesn't conflict, I have > > enlarged the intersection and reduced the (symmetric) > > difference. When I repeat this, and also push, and everything > > merges cleanly, then our repos are semantically identical, > > period. I just don't have to care about the order, either one is > > fine. > > This is a useful explanation! Thanks. This is indeed the essential motivation behind patch theory. I should perhaps add that this universal property is (and must be) maintained even in the presence of conflicting patches. Suppose our patch sets are equal except for one patch A in my and one patch B in your repo, where A and B conflict. If I pull your patch (allowing conflicts) and push mine (also allowing conflicts) then we still have equivalent repos with the same resulting (pristine) tree, even though in your repo the order is [...,B,A] and in mine it is [...,A,B]. This tree is missing all primitive changes in A and B that conflict (the "automatic" resolution is to remove both changes). Trouble ensues when we both manually resolve the conflict (re-adding one or the other change in a suitably modified form, or a mixture of both) because these two resolution patches will inevitably conflict again; so conflict resolution requires co-ordination between developers. The standard work-flow (supported by the default that allows conflicts when pulling but not when pushing) is to push only after resolving conflicts. /How/ to maintain all the required patch properties in the presence of conflicting patches (including an efficient representation of conflicts), is a pretty complicated matter. Also, for fairness, I should mention that the darcs-2 patch format is unsound as it stands: there are situations where some of the properties are violated -- not the one about equal patch sets resulting in equal trees, AFAIK, but certain others which can lead to observable strangeness in Darcs' behavior and even crashes. This is due to an improper handling of /duplicate/ patches which do not necessarily conflict and instead are represented using a special patch type. Fixing this is possible (thanks to work by Ian Lynagh, which also fixes the remaining cases where we run into exponential blow-up). But any such fix will be incompatible with the current patch representation, so will have to wait until we are ready to release darcs-3 with a new patch format. > > > This is what happens in git now, except that you are able to set > > > your own defaults in .git/config, and provide aliases for URLs > > > (the remotes). You can argue that remotes provide more confusion > > > than convenience if you like, but several years of experience > > > have shown that for the vast majority of git users it's the other > > > way around. > > > > You sound so confident when you say that. As if the git we have > > today was the result of incorporating years of user feedback. OTOH > > you keep telling me that git is the way it is because the > > developers have mde and still make it for their own good, > > primarily. And that the UI is more or less frozen because > > Mr. T. said so many years ago. > > There's no conflict between itch-scratching and Mr. T's decrees on one > hand, and general user satisfaction with the remote feature on the > other. Unless you're doing something tricky, the workflow in most > projects is pretty simple: go to GitHub, fork the official repo to > your account on GitHub, clone your fork to your workstation, make > branches for each "piece of work" (defined by the project leadership), > push them to your fork when done and submit a "pull request". > Management of remotes in this scenario is completely transparent to > the ordinary contributor: "clone" does all the work. This is not how it works in practice for me. I first clone the public repo locally, build it and experiment with it. Perhaps later i find a bug, fix it, and want to contribute my patch. Oh noes, I made the patch without forking the repo first. Okay back to github, fork. (Then comes the point at which I can't remember how the ssh URL of my fork must be. I don't want to know how often I googled for that particular strangeness, I think on has to say g...@github.com:/user/repo or something like that). Now I have to re-configure my local repo so that it works with the right remote. Then I can finally push and then I must make a pull request, using the crappy text editor in the web interface. You may find this easy and natural but I guess that is because you did it often enough that you don't notice how unnecessarily complicated all this is. The usual Darcs work-flow is like this: you clone the repo, play with it, record an improvement and 'darcs send' your patch, and that's it. (You can add a text to the description of the patch bundle you are sending in your favorite text editor.) On the other end, maintainer applies the patch in a local clone and decides whether to accept it or not. If it turns out your patch conficts (e.g. you forgot to pull before recording you change), or there are other things to criticize, she'll probably ask you to re-send an amended version; or perhaps does that herself. BTW, I have read that Linux kernel development shuns github because it doesn't scale; they use mailing lists instead. That would fit nicely with the Darcs work-flow I described here :-) > > > This is not true for branches. "Colocated branches" (ie, the many > > > branches per repo model) do seem to cause confusion. My guess is that > > > a Darcs-with-branches would have the same problem. > > > > I hope we can avoid that. > > Perhaps you can. It will depend on how many users with a "centralized > VCS" mindset you attract. I'm not sure of whether that mindset is > "organic", or whether it's a matter of experience with centralized > systems. (The canonical example is Richard "I'm a genius hacker and > I've always committed directly to the production repo" Stallman, who > obviously had decades of experience with RCS and CVS before Emacs > switched to Bazaar, and then git. As people who grew up with DVCS > become the overwhelming majority, perhaps that mindset will just > f-f-f-fade away, as Peter Townsend sang.) I fear people won't understand DVCS any better because they got hooked by github. > > > In context, "short-lived deviation" is exactly the sense I meant: > > > in case of a merge with way too many conflicts, you want to > > > "rollback" to the pre-merge state. > > > > But doesn't this loose the changes you made? > > Which changes? First, there should be no uncommitted changes in the > workspace when the merge is started. If there are, commit them > (perhaps to another branch). of course; I meant the changes you recorded/committed. > Second, if you've fixed a few files > before discovering the mess, you can commit them separately to an > appropriate branch (usually your mainline). How do i commit a file separately to another branch? > You'll have to redo to > the merge, but for those files you always choose your existing fixed > version. > > Perhaps it's not as good as it could be but you don't need to lose > work. I grant that this is *not* the image you would get from > "rollback to the premerge state", but in my experience it's usually > pretty obvious when you've got a mess before trying to fix it, so > that's the majority of cases anyway. Okay. > > In the situation where I have complicated conflicts, I usually use > > 'darcs rebase' to resolve them one patch at a time. The work-flow is > > like this: you say 'darcs rebase pull', which suspends any local patches > > that conflict with remote ones. [....] > > > My experience is that it is much easier to resolve complicated conflicts > > in this step-wise fashion. > > This sounds like the optimization I obliquely referred to above. > > > If you had unrecorded changes you are out of luck: > > There's no good reason for having unrecorded changes in any of the > DAG-based systems. They all provide stash or something like it. I > can't see any reason for it in Darcs, either, a record followed by an > immediate reversion patch is effectively a stash, if Darcs doesn't > already have that feature. Yes, of course. But Darcs doesn't force you to commit changes before you pull so it is possible to forget to do it, which is why I recommend using --no-allow-conflicts as the default for pull, too. > > > Sigh. This simply isn't true. *The DAG is immutable.* > > > > Ah, I never doubted that the DAG remains consistent in itself. What I > > meant is the consistency of the changes to your tree. For instance, if > > you use cherry-picking to re-order changes, can you be sure that after > > picking all the commits in a branch the resulting tree will be the same > > as in the original? I don't think so. > > You can be sure in the same circumstances as in Darcs: when the > cherry-picking involves no manual resolution of conflicts. I am pretty certain that this is wrong. Relevant reading includes http://r6.ca/blog/20110416T204742Z.html with discussion on reddit https://www.reddit.com/r/programming/comments/grqeu/git_is_inconsistent/ (It's been a while since I looked closely at these examples so perhaps they are out-dated.) > > Assuming a modernized version of Darcs with in-repo branches, > > better (guaranteed to be efficient i.e. polynomial, ideally linear) > > conflict handling, and a more efficient representation of binary > > hunks: yes, I think [managing the Linux kernel or GCC with Darcs] > > would be possible and would actually work better than git. > > Good luck! I hope you have the time and the help to get there. (I > don't have time to learn enough Haskell for the foreseeable future.) Too bad; I would probably enjoy working with you. > > I still find it interesting that in Darcs I never missed remote tracking > > branches yet. > > I don't see why you would, since Darcs forces you to manage it > manually anyway. That is, the only way you can keep a mirror of the > "official" repository's state is by keeping a pristine repository, as > you describe below.[1] Keeping "pristines" is the way I have > historically managed my Mercurial and Bazaar projects, and still do > for those projects still using Mercurial (all my remaining Bazaar > projects are now sufficiently stable that I just work in the pristine > for my decennial patches ;-). > > Otherwise, you just depend on network connectivity, and pull directly > into the working copy, or diff against it. I think i see how with in-repo branches it would become natural to keep one or two untouched copies of the remote for comparison when working offline e.g. one for where you started and one for closely tracking the remote branch. [re-ordered from below:] > [1] Theoretically you could use tags, but that would be difficult in > Darcs without cooperation from the official repo, AFAICS. Not at all. You can record tags ad libitum for our own purposes. So instead of keeping the where-I-started branch you could just add a tag. I often do this before making complicated or far-reaching changes. Just make sure you don't accidentally push it... > > I guess the work-flow with Darcs is just different enough that some > > concepts (or problems) simply do not transfer naturally. > > I think so. I think some of them will arise in a multibranch version > of Darcs, though. > > > > No, that default is only for a clone, and it's whatever is > > > checked out in the source repo, which is usually "master" for a > > > public repo. > > > > But this is horrible. > > Not in practice. :-) Only because maintainers of public repos know about this and are careful not to checkout other branches in such repos. You will have a hard time to convince me that this was a good design decision. It conflates a purely local setting (what is checked out, i.e. what I currently work on) with the publically visible interface (the default for what you get with clone). Mercurial does this differently: what you get when cloning is independent of what is "checked out" and there is no need at all for bare repos. I did some googling and found this on stack overflow: https://stackoverflow.com/questions/8952865/mercurial-set-a-branch-as-the-new-default-branch which, if it tells the truth, means that it behaves a bit more like how I think it should. Which is, assuming that you really want to support a single 'default' branch, to make this an explicit setting independent of most of the other operations on the repo. >> "Whatever is checked out in the source repo" is completely >> unpredictable (unless you make sure it is a bare repo so nobody >> would checkout anything there). > > There still needs to be a HEAD (which is what determines what is > checked out). In any formalized workflow, it will be a bare repo, > so I'm not sure you would experience any problem. And I thought git was supposed to shine for the "chaotic" kernel dev style where everyone clones and merges everyone elses repos? > > > > What about the sharing with colleagues? [...] You really want > > > > a third repo in between upstream and local for that. > > > > > > Yes, as I describe above these days it's typically on GitHub. > > > > Unacceptable in many companies. Also unnecessarily slow, etc etc. > > Sure, but it's trivial create one in-house: any git repo reachable by > network will do. Maintaining and managing that is *non*-trivial; > that's why GitHub is so successful, they're darn good at automating > that stuff. But it's not *that* hard to create a reasonable workflow, > easier to teach it, and only the gatekeepers need to know the > necessary operations for acquiring and merging contributions. Oh, I am sure it can be managed, but I would still prefer it to be simpler. I /am/ one of the "gatekeepers" at work (not in any formally recognized way, though) which is one reason why we still use Darcs ;-) > > Let's drop [the discussion of what's a URI] and agree to disagree. > > OK, but remember you're also disagreeing with RFC 3986. :^) I guess you won that round ;-) > > > > You said earlier that git represents a submodule as a tree object > > > > that is itself a commit. But it cannot be the commit that > > > > represents the current (pristine) tree in the submodule, else I > > > > could not make a commit in the submodule (or pull there) without > > > > makeing a commit in the containing repo/branch. > > > > > > I'm not sure what you mean by this. > > > > I am trying to understand how submodules work in git. So I have a subdir > > "bar". The tree referenced by the current commit (of the supermodule) > > has an entry for "bar" and its content object is not a file but another > > commit. So suppose I pull a different commit inside the submodule. Would > > that not mean that the supermodule needs to change, too, i.e. refer to > > this new commit instead of the old one? But that cannot be, since the > > commit of the supermodule is immutable.... ahh, I think I do understand: > > git will show me this update as an uncommitted change! I can commit it > > in the supermodule and then it "officially" refers to this new commit of > > the submodule. Correct? > > Exactly. Good! It makes a lot of sense to do it like that. You track the changes in a subrepo, but not at the level of files and directories, but at the much coarser level of commits. Doing the same in Darcs is a bit more difficult. If we regard the (pristine) state of a subrepo as a set of patches, then the (primitive) patches that modify this state consist of a set of abstract patch hashes to remove and a another such set to add, quite similar to a file hunk. This requires more storage space than in git, but I guess it's not an unreasonable amount. If we add an internal data structure where we can lookup the meta data given its hash, this lets us display the difference that the subrepo-patch represents in a way similar to 'darcs pull --dry-run' plus 'darcs push --dry-run' (which is how you compare the pristine states of two Darcs repos). One shortcoming of this approach is that we cannot (efficiently) support either --verbose here (which, in Darcs, means to output not only the meta data but also the content of the patch); nor --summary (which summarizes the affected files and how they are changed, similar to 'git status'). These options require access to the patch content and that depends on context i.e. the order of patches. For the same reason, the set of abstract patch hashes is not sufficient to reconstruct a repo, you need a 'real' repo to actually pull the patches from. In fact, for the sketched approach to work efficiently when initializing a subrepo, we probably need an optimized representation of the pristine repo state. I have a few ideas for that but the details are probably not of interest in the context of our discussion. The UI would be less configurable and more automatic than the one git has. For instance, when I apply a patch that modifies a subrepo, I would want the subrepo to be updated automatically (obliterating patches to be removed, then pulling patches to be added). Similarly, when cloning a repo I want the subrepos to be initialized and updated, too. To associate subrepos with URLs we can use a file similar to .gitmodules (the actual name can be a pref setting, as for the boringfile), but with the difference that a subrepo is identified with a UUID, not a human readable name. (I think I'll add a ticket to our tracker or an entry to the wiki with a copy of this design sketch.) > It's really complicated. This is one of those features where "if you > don't know (1) *why* you need it (what specific workflow issues it > addresses) *and* (2) *how* you will modify your workflow to address > those issues using this feature, YOU DO NOT NEED IT and YOU WILL BE > SORRY if you try it anyway." :-) Probably. Nevertheless, I learned that with git understanding how the machinery works is the best approach, and since I had my "aha" above, submodules have lost a lot their scariness for me. Do you know the reason why 'git clone' does not initialize submodules? Is this for backward compatibility or because 'git submodule update --init' is not deemed a good enough default? Cheers Ben _______________________________________________ darcs-users mailing list darcs-users@osuosl.org https://lists.osuosl.org/mailman/listinfo/darcs-users