On May 5, 2009, at 6:42 AM, C. Titus Brown wrote:
> from what I can tell: > > github maintains a history of relationships between repositories. > > If I github-fork your pygr repository into a new 'pygr' repository > in my > account, github records that fork and stores only the differences > between your repo and mine. Are you sure about this? Can you point me to where you got this explanation from github? This sounds very un-git-like to me. After all, each git repository is supposed to be a complete, functional repository (that's the D in DVCS), and merely refers to other copies of the repository by defining "remotes" for push / pull purposes. As far as I know, there is no way in git to make a "delta repository" that merely stores the diffs vs. a full repository stored somewhere else. But of course, my knowledge of git only scratches the surface of everything it can do... I thought github forks work as follows: - each repository is a normal git repository, whether it is a fork or not. - a fork differs from a non-fork repository in one minor way: it defines a remote called "upstream" that simply points to the parent repository. That enables the user to get updates from the parent repository by just saying "git pull upstream <branchname>". In all other respects, a fork repository is a full, normal git repository just like a non-fork repository. If someone were to delete the parent repository, I don't believe everyone's fork repositories would turn into pumpkins. I think the only effect would be that if a user of a fork repository tried to pull from the "upstream" remote, s/he'd get an error message saying the URL failed (because the parent repo no longer exists). The user would still be able to push to origin (i.e. the fork repo stored on github) just fine, I think, and others would be able to clone / pull from it just fine too. - outside the git repositories, github keeps some metadata about "who forked from who" to provide nice web features like pull-requests, network diagrams, etc. As far as I know, this is just metadata, and quite independent of what's actually stored in the actual git repositories. If you can point me to some information on github showing that this interpretation is wrong, please let me know... > > > No matter what I rename my pygr repo to, github will only let me > github-fork one copy of your pygr repo. I think github is assuming two principles that make it unnecessary for one user to have multiple forks of one repo: - git branches provide you with unlimited ability to create as many experimental versions and branches-of-branches ad nauseum. Why fork when you can branch? - github assumes a clean separation of "collaborators" into trusted vs. untrusted. Ordinary collaboration will be exclusively by forking and pulling, i.e. each person has their own repository that they control, and changes only flow from one person to another only when the latter decides to pull someone else's changes into his personal repository. That is a clean model of voluntary sharing with autonomy. In special circumstances where you trust someone *completely*, you give them push privileges to your repository. But if as you say you feel the need to keep a master repo that your collaborators cannot push to, then actually you *don't* trust them completely, and github would say that you should work with them using the normal fork-and-pull model. Since github provides easy and convenient tools for working together in that way, that doesn't seem like a problem. My impression is github does not want multiple people using one account; multiple forks of one repo in one account would seem to encourage that unwanted behavior. > > How will people know that cjlee112 owns the official github repo for > pygr? Will they have to look at the branching diagrams or read > something on some Web page somewhere? (Evidence suggests that neither > is likely to happen ;) First, let's emphasize this issue only arises for developers using git, not regular users. The vast majority of people are just going to download an installer package. Only developers who want the bleeding edge or want to contribute, and know how to use git, are going to face this question. Developers will know which repo is master several ways: - The Google Code source code link points to my cjlee112 github repo. - Other github "pygr" repos will be forks of the cjlee112 pygr.git repo. - we will tell them, consistently throughout our website materials etc. But as you say, we can certainly maintain the repo.or.cz site as a replicate master (i.e. it will mirror the cjlee112 pygr.git github repo). In my view the key question isn't "Is there a possibility of confusion?" but "Do the benefits of encouraging developers to create a fork and contribute changes back outweigh the potential disadvantages?" Over the last year we've seen a lot of benefits of opening up Pygr development through the use of git and online collaboration tools. That has been great. I think github represents a logical next step for creating a community of developers contributing code back and forth. > > > This is, I think, why Marek and I keep on suggesting alternatives like > 'continue using git.or.cz for the master' or 'use a new pygr > account' or > 'name it pygr-master'. It's a clean and fairly obvious way of > specifying which repo is the master. But the consequences of keeping both "pygr" and "pygr-master" repos on github may be less than clean and obvious. Say I create a repo called pygr and another called pygr-master as you recommended. Now it's "clear" that pygr-master is the master repo, whereas the pygr repo is... not the master? Now other people say, "I want to fork from the master repo, not the lame non-master repo". So they fork "pygr- master" and now we have a big group of people with "pygr-master" repos, and maybe another bunch of people with "pygr" repos. When a new developer arrives and wants to join the fun, s/he has to figure out the following puzzles: - which is the "real" pygr, "pygr" or "pygr-master"? "I see 7 people with pygr repos and 9 people with pygr-master repos. Some of them, like that cjlee112 guy, have both. And the tools that work within one of these groups, (e.g. pull requests and network diagrams) will not work for collaborating with the other group, because there is *no* connection between them. Puzzling." - after picking between those two groups, s/he still faces the question you raised -- "who should I fork from, cjlee112, ctb, istvan, etc.?" No problem has been solved by this "solution"... > > > Also, if our experience so far is any guide, you'll end up with a > number > of personal branches as well as "official" branches (0.7, 0.8, etc.) > in > the cjlee112 pygr repo on github, which could also be confusing to > people looking to get the latest. I think we're over-thinking this. After all, this is only for git- savvy developers, not ordinary users. Git universally follows the naming convention that "master" is the primary branch to keep in synch (i.e. what you called "the latest"), and anyone familiar with git will know that. Your question, as I read it, is just a restatement of the fact that git has a learning curve because it offers so much more power, and that git repositories customarily get used in more sophisticated ways than the average CVS / SVN repo did. I think the potential confusion for new developers is outweighed by the tools github offers for making it so darn easy for multiple people to work together on any branch they care to name. -- Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---