Re: What exactly is a "initial checkout"
On 07/11/2018 08:50, Christian Halstrick wrote: Ok, I know understand the problems which are solved by this special behaviour of a "initial checkout". And also important I understand when exactly I should do a "initial checkout" - when the index file does not exist. I'll share my new knowledge with JGit :-) Given that the initial query was about the lack of documentation for the term "initial checkout", do you have any suggestion of how it might best be incorporated into the documentation to assist future reader? -- Philip
Re: if YOU use a Windows GUI for Git, i would appreciate knowing which one and why
Hi Gerry, I'll give my view, as someone approaching retirement, but who worked as an Engineer in a mainly Windows environment. On 04/11/2018 17:48, _g e r r y _ _l o w r y _ wrote: PREAMBLE [START] - please feel free to skip this first section Forgive me for asking this question on a mailing list. stackoverflow would probably kill such a question before the bits were fully saved to a server drive. Let me explain why i am asking and why i am not being a troll. [a] i'm "old school", i.e., > 50% on my way to being age 72 [born 1947] 8 years behind.. [b] when i started programming in 1967, most of my work input was via punched cards '69, at school, post/compile/run/wait for post; 1 week (Maths club) [c] punching my own cards was cool Pin punching individual chads ;-) [d] IBM System/360 mainframe assembler was cool and patching previously punched card encoded machine code output was a fun risky but at times necessary challenge. Eventually the 370 at university. [e] using command windows and coding batch files for Gary Kildall's CP/M and the evil empire's PC/MS-DOS was how i accomplished many tasks for early non-GUI environments (i still continue this practice even in Windows 10 (a.k.a. please don't update my PC O/S behind my back again versions of MS Windows)). Engineer in electronics; software was an interlinked part of electronics back then [f] my introduction to Git was via a command line based awesome video that has disappeared (i asked this community about that in a previous thread). Discovered in 2011 via 'Code News' article - Spotted immediately that it solved the engineers version control issue because it 'distributed' the control. I've tried a few of the Gui's. BOTTOM LINE: virtually 100% of my Git use has been via Git Bash command line [probably downloaded from https://git-scm.com/] For me, and i suspect even for most people who live with GUI platforms, [a well kept secret fact] using the keyboard is faster than using the mouse [especially when one's fingers are already over one's keyboard-example, closing one or more "windows" via Alt+F4. Also for me, i am happy to change some code and/or write some new code, Alt+Tab to Git Bash frequently, ADD/COMMIT, then Alt+Tab back to whatever IDE i'm using [mostly LINQPad and vs2017]; i know that's quite a bit schizophrenic of me-command line Git but GUI IDE. PREAMBLE [END] QUESTION: if YOU use a Windows GUI for Git, i would appreciate knowing which one and why i have been asked to look at GUI versions of Git for Windows. I presume that this is for a client who isn't sure what they want http://www.abilitybusinesscomputerservices.com/home.html https://git-scm.com/download/gui/windows currently lists 22 options. That's nearly as bad as choosing a Linux distro ;-) if i had more time left in my life and the option, because of my own nature, i'd likely download and evaluate all 22 - Mr.T would pity the fool that i often can be. CAUTION: i am not looking for anyone to disparage other Git Windows GUIs. Let me break down the question into 4 parts: [1a] Which do you prefer: Git GUI, Git command line? I use the three parts provided as part of regular Git and Git for Windows, that is git-gui, gitk and git cli in a terminal (mintty) [1b] What is your reason for your [1a] preference? I have been in a general Windows environment for decades. The Gui format with single buttons/drop downs that do one thing well, without finger trouble side effects, is good in such environments. One cannot be master of everything. The cli is good for specialists and special actions, especially precision surgery. The key is to avoid the "the surgery was a success but the patient died" results. [2a] if applicable, which Git GUI do you prefer? git-gui and gitk are now the only two I use. [2b] What is your reason for your [2a] preference? Many of the other Gui's hide the power of Git and its new abstraction of no longer actually being about "Control" (by 'management'). Now it is about veracity. If you have the right object ID (sha1/sha256) you have an identical original [there are no 'copies', all Mona Lisas with the hash are the same]. Management can choose which hash to accept upstream. Most other Gui's try to hide behind the old school Master-copy view point that was developed in the 19th century for drawing office control. If you damaged the master drawing the ability to make things and do business was lost. Protecting the master drawing was everything. They were traced before they went to the blue print machine. Changes were batched up before the master could be touched (that risk again). Too may Gui's (and their Managements!) still try to work the old way, loosing all the potential benefits. They are still hammer wielders looking for nails, and only finding screws to smash. I've heard reasonable things about SmartGit but that costs money so I
Re: Git Slowness on Windows w/o Internet
On 03/11/2018 16:44, brian m. carlson wrote: On Fri, Nov 02, 2018 at 11:10:51AM -0500, Peter Kostyukov wrote: Wanted to bring to your attention an issue that we discovered on our Windows Jenkins nodes with git scm installed (git.exe). Our Jenkins servers don't have Internet access. It appears that git.exe is trying to connect to various Cloudflare and Akamai CDN instances over the Internet when it first runs and it keeps trying to connect to these CDNs every git.exe execution until it makes a successful attempt. See the screenshot attached with the details. Enabling Internet access via proxy fixes the issue and git.exe continues to work fast on the next attempts to run git.exe Is there any configuration setting that can disable this git's behavior or is there any other workaround without allowing Internet access? Otherwise, every git command run on a server without the Internet takes about 30 seconds to complete. Git itself doesn't make any attempt to access those systems unless it's configured to do so (e.g. a remote is set up to talk to those systems and fetch or pull is used). It's possible that you're using a distribution package that performs this behavior, say, to check for updates. I'd recommend that you contact the distributor, which in this case might be Git for Windows, and see if they can tell you more about what's going on. The URL for that project is at https://github.com/git-for-windows/git. The normal Git for Windows install includes an option to check for updates at a suitable rate. Maybe you are hitting that. It can be switched off. -- Philip
Re: git projects with submodules in different sites - in txt format (:+(
On 02/10/2018 06:47, Michele Hallak wrote: Hi, I am getting out of idea about how to change the methodology we are using in order to ease our integration process... Close to despair, I am throwing the question to you... We have 6 infrastructure repositories [A, B, C, D, E, F ?]. Each project [W,X,Y,Z] is composed of 4 repositories [1-4], each one using one or two infrastructure repositories as sub-modules. (Not the same) e.g. W1-W4; with say B & D as submodules The infrastructure repositories are common to several projects and in the case we have to make change in the infrastructure for a specific project, we are doing it on a specific branch until properly merged. Do you also have remotes setup that provide backup and central authority to the projects..? Everything is fine (more or less) and somehow working. Good.. Now, we have one project that will be developed in another site and with another git server physically separated from the main site. Is it networked? Internal control, external internet, sneakernet? I copied the infrastructure repositories in the new site and removed and add the sub-modules in order for them to point to the url in the separated git server. Every 2 weeks, the remotely developed code has to be integrated back in the main site. My idea was to format GIT patches, integrate in the main site, tag the whole thing and ship back the integrated tagged code to the remote site. ... and now the nightmare starts: yep, you have lost the validation & verification capability of Git's sha1/oid and DAG. Since the .gitmodules is different, I cannot have the same SHA and then same tag and I am never sure that the integrated code is proper. Remotes, remotes... May be there is a simple solution that I don't know about to my problem? Is there something else than GIT patches? Should I simply ship to the remote site the code as is and change the submodules each time? I think the solution you need is `git bundle` https://git-scm.com/docs/git-bundle. This is designed for the case where you do not have the regular git transport infrastructure. Instead it records the expected data that would be 'on the wire', which is then read in at the far end. The bundle can contain excess data to ensure overlap between site transmissions. You just run the projects in the same way but add the courier step for shipping the CD, or some password protected archive as per your security needs. Everything is should be just fine (more or less) and somehow it will just work. ;-) -- Philip https://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo
[PATCH 0/1] Re: git silently ignores include directive with single quotes
Rather than attaching the problem with code, I decided to simply update the config file documentation. As the userbase expands the documentation will need to be more comprehensive about exclusions and omissions, along with better highlighting for core areas. I would be useful if Stas could comment on whether these changes would have assisted in debugging the faulty config file. Philip Oakley (1): config doc: highlight the name=value syntax Documentation/config.txt | 16 1 file changed, 12 insertions(+), 4 deletions(-) -- 2.17.1.windows.2
[PATCH 1/1] config doc: highlight the name=value syntax
Stas Bekman reported [1] that Git config was not accepting single quotes around a filename as may have been expected by shell users. Highlight the 'name = value' syntax with its own heading. Clarify that single quotes are not special here. Also point to this paragraph in the 'include' section regarding pathnames. In addition clarify that missing include file paths are not an error, but rather an implicit 'if found' for include files. [1] https://public-inbox.org/git/ca2b192e-1722-092e-2c54-d79d21a66...@stason.org/ Reported-by: Stas Bekman Signed-off-by: Philip Oakley --- Documentation/config.txt | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1264d91fa3..b65fd6138d 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -19,8 +19,8 @@ characters and `-`, and must start with an alphabetic character. Some variables may appear multiple times; we say then that the variable is multivalued. -Syntax -~~ +Config file Syntax +~~ The syntax is fairly flexible and permissive; whitespaces are mostly ignored. The '#' and ';' characters begin comments to the end of line, @@ -56,6 +56,9 @@ syntax, the subsection name is converted to lower-case and is also compared case sensitively. These subsection names follow the same restrictions as section names. +Variable name/value syntax +^^ + All the other lines (and the remainder of the line after the section header) are recognized as setting variables, in the form 'name = value' (or just 'name', which is a short-hand to say that @@ -69,7 +72,8 @@ stripped. Leading whitespaces after 'name =', the remainder of the line after the first comment character '#' or ';', and trailing whitespaces of the line are discarded unless they are enclosed in double quotes. Internal whitespaces within the value are retained -verbatim. +verbatim. Single quotes are not special and form part of the +variable's value. Inside double quotes, double quote `"` and backslash `\` characters must be escaped: use `\"` for `"` and `\\` for `\`. @@ -89,10 +93,14 @@ each other with the exception that `includeIf` sections may be ignored if their condition does not evaluate to true; see "Conditional includes" below. +Both the `include` and `includeIf` sections implicitly apply an 'if found' +condition to the given path names. + You can include a config file from another by setting the special `include.path` (or `includeIf.*.path`) variable to the name of the file to be included. The variable takes a pathname as its value, and is -subject to tilde expansion. These variables can be given multiple times. +subject to tilde expansion and the value syntax detailed above. +These variables can be given multiple times. The contents of the included file are inserted immediately, as if they had been found at the location of the include directive. If the value of the -- 2.17.1.windows.2
Re: Receiving console output from GIT 10mins after abort/termination?
From: "Frank Wolf" Sent: Wednesday, July 18, 2018 7:38 AM Hi @ll, I hope I'm posting to the right group (not sure if it's Windows related) but I've got a weird problem using GIT: By accident I've tried to push a repository (containing an already commited but not yet pushed submodule reference). This fails immediately with an error of course BUT after 10 mins I get an output on the console though the command exited!? (... $Received disconnect from : User session has timed out idling after 600 ms) Does anyone have an explanation why I still get an output after the command was aborted? /Frank I think this is a Windows environment issue. I have added a repy to the GitHub git-forwindows tracker. https://github.com/git-for-windows/git/issues/1762#issuecomment-406851107 I think you may have found a special case so will need extra details from you about the setup and hopefully an MVCE. Philip
Re: git-gui ignores core.hooksPath
From: "Johannes Schindelin" Hi Phillip, On Wed, 14 Jun 2017, Philipp Gortan wrote: thanks for following up, > Indeed. Why don't you give it a try? Actually, I already did: https://github.com/patthoyts/git-gui/pull/12 You might want to post your analysis and patch there as well... I wonder what good posting my analysis did, if nothing changed as a consequence. FWIW I opened this PR with Git for Windows to fix it properly: https://github.com/git-for-windows/git/pull/1757 I plan on consolidating all of the PRs at https://github.com/patthoyts/git-gui, too, and to try to get them into git.git. I guess that means that I just volunteered as interim maintainer of the git-gui repository. However, I will really act as maintainer, not as "cleaner upper". "Curator" is a useful intermediate level concept between active maintenance and passive benign neglect, if that term is a help... -- Philip
Re: [RFC PATCH 4/6] sequencer.c: avoid empty statements at top level
From: "Eric Sunshine" To: "Beat Bolli" On Sun, Jul 8, 2018 at 10:44 AM Beat Bolli wrote: The marco GIT_PATH_FUNC expands to a complete statement including the s/marco/macro/ semicolon. Remove two extra trailing semicolons. Signed-off-by: Beat Bolli --- sequencer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) While you're at it, perhaps it would be a good idea to fix the example in path.h which teaches the "wrong" way: /* * You can define a static memoized git path like: * *static GIT_PATH_FUNC(git_path_foo, "FOO"); * * or use one of the global ones below. */
Re: [msysGit] Possible git status problem at case insensitive file system
Hi Frank, Your system Clock looks to be providing the wrong date for your emails. The last XP version was https://github.com/git-for-windows/git/releases/tag/v2.10.0.windows.1 so you may want to upgrade to that. (see FAQs https://github.com/git-for-windows/git/wiki/FAQ) It won't solve the capitalisation problem - that is a Windows FS issue. Git assumes case matters, but the FS will fetch directories and branches case insensitively. Philip - Original Message - From: "Frank Li" To: "Git List" ; "msysGit" Sent: Monday, August 09, 2010 5:22 AM Subject: [msysGit] Possible git status problem at case insensitive file system All: I use msysgit 1.7.0.2 at windows xp. Problem: git status will list tracked directory as untracked dir. Duplicate: 1. mkdir test, cd test 2. git init-db 3. mkdir d, cd d 4. touch a.c 5. git add a.c 6. git commit -a -m "test" 7. cd .. 8. mv d d1 9. mv d1 D 10. git status # On branch master # Untracked files: # (use "git add ..." to include in what will be committed) # # D/ nothing added to commit but untracked files present (use "git add" to track) D/ should be same as d/ at case insensitive file system. D/ should not listed by git status. best regards Frank Li -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- *** Please reply-to-all at all times *** *** (do not pretend to know who is subscribed and who is not) *** *** Please avoid top-posting. *** The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free. You received this message because you are subscribed to the Google Groups "msysGit" group. To post to this group, send email to msys...@googlegroups.com To unsubscribe from this group, send email to msysgit+unsubscr...@googlegroups.com For more options, and view previous threads, visit this group at http://groups.google.com/group/msysgit?hl=en_US?hl=en --- You received this message because you are subscribed to the Google Groups "Git for Windows" group. To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: GDPR compliance best practices?
From: "Theodore Y. Ts'o" Sent: Friday, June 08, 2018 3:53 AM On Fri, Jun 08, 2018 at 01:21:29AM +0200, Peter Backes wrote: On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote: > > Again: The GDPR certainly allows you to keep a proof of copyright > > privately if you have it. However, it does not allow you to keep > > publishing it if someone exercises his right to be forgotten. > someone is granting the world the right to use the code and you are > claiming > that the evidence that they have granted this right is illegal to have? Hell no! Please read what I wrote: - "allows you to keep a proof ... privately" - "However, it does not allow you to keep publishing it" The problem is you've left undefined who is "you"? With an open source project, anyone who has contributed to open source project has a copyright interest. That hobbyist in German who submitted a patch? They have a copyright interest. That US Company based in Redmond, Washington? They own a copyright interest. Huawei in China? They have a copyright interest. So there is no "privately". And "you" numbers in the thousands and thousands of copyright holders of portions of the open source code. And of course, that's the other thing you seem to fundamentally not understand about how git works. Every developer in the world working on that open source project has their own copy. There is fundamentally no way that you can expunge that information from every single git repository in the world. You can remote a git note from a single repository. But that doesn't affect my copy of the repository on my laptop. And if I push that repository to my server, it git note will be out there for the whole world to see. So someone could *try* sending a public request to the entire world, saying, "I am a European and I demand that you disassociate commit DEADBEF12345 from my name". They could try serving legal papers on everyone. But at this point, it's going to trigger something called the "Streisand Effect". If you haven't heard of it, I suggest you look it up: http://mentalfloss.com/article/67299/how-barbra-streisand-inspired-streisand-effect Regards, - Ted Hi Ted, I just want to remind folks that Gmane disappeared as a regular list because of a legal challenge, the SCO v IBM Unix court case keeps rumbling on, so clarifying the legal case for: a) holding the 'personal git meta data', and b) disclosing (publishing) 'personal git meta data' under various copyright and other legal issue scenarios relative to GDPR is worth clarifying. I'm of the opinion that the GPL should be able to allow both holding and disclosing that data, though it may need a few more clarifications as to verifying that the author is 'correct' (e.g. not a child) and if a DCO is needed, etc. We are already looking at a change to the hash, so the technical challenge could be addressed, but may create too many logical conflicts if 'right to be forgotten' is allowed (one hash change is enough;-) Philip
Re: GDPR compliance best practices?
Hi Peter, David, I thought that the legal notice (aka 'disclaimer') was pretty reaonable. Some of Peter's fine distinctions may be technically valid, but that does not stop there being legal grounds. The proof of copyright is a legal grounds. Unfortunately once one gets into legal nitpicking the wording becomes tortuous and helps no-one. If one starts from an absolute "right to be forgotten" perspective one can demand all evidence of wrong doing , or authority to do something, be forgotten. The GDPR has the right to retain such evidence. I'll try and comment where I see the distinctions to be. From: "Peter Backes" Hi David, thanks for your input on the issue. LEGAL GDPR NOTICE: According to the European data protection laws (GDPR), we would like to make you aware that contributing to rsyslog via git will permanently store the name and email address you provide as well as the actual commit and the time and date you made it inside git's version history. This is simply an information statement This is inevitable, because it is a main feature git. The "inevitable" word creates a point of argument within the GDPR. Removing the word (and 'because/main') brings the sentance back to be an informative statement without a GDPR claim. As we can, see, rsyslog tries to solve the issue by the already discussed legal "technology" of disclaimers (which is certainly not accepted as state of the art technology by the GDPR). In essence, they are giving excuses for why they are not honoring the right to be forgotten. Disclaimers do not work. They have no legal effect, they are placebos. The GDPR does not accept such excuses. If it would, companies could arbitrarily design their data storage such as to make it "the main feature" to not honor the right to be forgotten and/or other GDPR rights. It is obvious that this cannot work, as it would completely undermine those rights. The GDPR honors technology as a means to protect the individual's rights, not as a means to subvert them. If you are concerned about your privacy, we strongly recommend to use --author "anonymous " together with your commit. The [key] missing information here is whether rsyslog has a DCO (Developer Certificate of Origin) and what that contains. The git.git DCO is here https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L304-L349 This will also help discriminate between the "name" part and the identifier, as both could be separately anonymised (given the right DCO). Thus it may be that the name is recored as "anonymous", but with a that bridges the legal evidence/right to be forgotten bridge. This can only be a solution if the project rejects any commits which are not anonymous. However, we have valid reasons why we cannot remove that information later on. The reasons are: * this would break git history and make future merges unworkable This is not a valid excuse (see above). Within the GDPR, that is correct. It (breaking history validation), of itself, should not be the reason. The technology has to be designed or applied in such a way that the individuals rights are honored, not the other way around. In absence of other means, the project has to rewrite history if it gets a valid request by someone exercising his right to be forgotten, even if that causes a lot of hazzle for everyone. * the rsyslog projects has legitimate interest to keep a permanent record of the contributor identity, once given, for - copyright verification - being able to provide proof should a malicious commit be made True, but that doesn't justify publishing that information and keeping it published even when someone exercises his right to be forgotten. Publishing (the meta data) is *distinct* from having it. However publishing the content and it's legal copyright is also associated with identifying the copyright holder (who has released it). This can be the uid if they hide behind a legal entity. This creates the catch 22 scenario. You either start off public and stay public, or you start off private and stay there. Whether the rsyslog folk want to accept copyrighted work without appropriate legal release (who guards the guards, what's their badge number?) is part of the same information requirement. Malicious intent makes the submission (commit) part of a legal evidence one needs to retain, so is supported by GDPR. In that case, "legitimate interest" is not enough. There need to be "overriding legitimate grounds". I don't see them here. Please also note that your commit is public and as such will potentially be processed by many third-parties. Git's distributed nature makes it impossible to track where exactly your commit, and thus your personal data, will be stored and be processed. If you would not like to accept this risk, please do either commit anonymously or refrain from contributing to the rsyslog project. The onward publishing and release
Re: GDPR compliance best practices?
Hi Peter, (lost the cc's) From: "Peter Backes" On Sun, Jun 03, 2018 at 11:28:43PM +0100, Philip Oakley wrote: It is here that Article 6 kicks in as to whether the 'organisation' can retain the data and continue to use it. Article 6 is not about continuing to use data. Article 6 is about having and even obtaining it in the first place. Correct, and that is the part I was refering to. Recipients of the particular meta data require it for the licencing purpose. Thus they can continue to have (and 'need') that data. It is that 'other side of the fence' view I mentioned. Article 17 and article 21 are about continuing to use data. For an open source project with an open source licence then an implict DCO applies for the meta data. It is the legal basis for the the release. Neither article 6 nor 17 or 21 have anything remotely like an "implicit DCO" as a legitimization for publishing employee data. I was refering to 'implict' in a reverse direction, that is, the DCO supports the legal basis to have and hold the data. The express licence terms in the various open source licences give the permission, and becomes one of these legally conflicting aspects The GDPR is very explicit about implicit stuff never being a basis for consent, if you want to imply that is your basis. And consent can be withdrawn at any time anyway. An open source license has nothing whatsoever to do with the question of version control metadata. A public version control system is not necessary to publish open source software. > - copyright is about distributing the program, not about distributing > version control metadata. It is specificaly about giving that right to copy by Jane Doe (but git gives no other information other than that supposedly globally unique 'author email'. I don't get what you are saying. As I said, a public version control system is not necessary to publish open source software. The two things may be intimately related in practice, but not in theory. Such is the law. It's the practice that is legal/illegal, decided in court (if it gets there) > - Being named is a right, not an obligation of the author. Hence, if > the author doesn't want his name published, the company doesn't have > legitimate grounds based in copyright for doing it anyway, against his > or her will. Git for Open Source is about open licencing by name. I'd agree that a closed corporate licence stays closed, but not forgotten. Again I don't get what you are saying. The author has a right to be named as the author, not an obligation. This has nothing whatsoever to do with the question of Open Source vs. closed corporate licenses. The question is which clause is being used to justify an action. Those corporate organisations want a legal basis for holding data, not a voluntary permisson (because folk may try and rescind that permission... ). Those in open source want to ensure that their licence is a legal basis for other folk to have copies, and that folk can show they have that permission. Those with a personal data view, will focus on the hope that they can remove permission, especially for companies that are doing things they find unacceptable, and maybe 'illegal' or unethical. The GDPR attempts to balance the different set of expectaions, and the overlaps will need to be negotiated. Different nations (and individuals) have different perceptions as to what is normal and reasonable thus focus on different aspects, not appreciating the Competeing Values that are present in the different Frameworks of their weltanshauung. If a closed source corporate does publish their closed data, they have real internal problems anyway regarding that contradiction! > Let's be honest: We do not know what legitimization exactly in each > specific case the git metadata is being distributed under. We should know, already. A specific licence [or limit] should be in place. We don't really want to have to let a court decide ;-) It is insufficient to have a license for distributing the program. The license is not a GDPR legitimization for git metadata. Distributing the program can be done without distributing the author's identity as part of the metadata of his commits. The law is never decided by technical means, unfortunately. It is. The GDPR refers to the state of the art of technology without defining it. Thus, technical means are very important in the GDPR. This may be something new for lawyers. If technology changes tomorrow, even without anything else changing, you may be breaking the GDPR by this simple fact tomorrow, while not breaking it today. They will still argue about what is the state of the art, and that if the art is hidden in some lab, then it's not available to meet the criteia. Again: Technology is very important in the GDPR. We know quantum computing can crack the codes, but when does it become the state of the art. SHA1 has been 'cracked' once in one
Re: GDPR compliance best practices?
From: "Peter Backes" On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote: In most Git cases that legal/legitimate purpose is the copyright licence, and/or corporate employment. That is, Jane wrote it, hence X has a legal rights of use, and we need to have a record of that (Jane wrote it) as evidence of that (I'm X, I can use it) right. That would mean that Jane cannot just ask to have that record removed and expect it to be removed. Re corporate employment: For sure nobody would dare to quesion that a company has a right to keep an internal record that Jane wrote it. The issue is publishing that information. This is an entirely different story. It is here that Article 6 kicks in as to whether the 'organisation' can retain the data and continue to use it. https://gdpr-info.eu/art-6-gdpr/ https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/ https://www.lawscot.org.uk/news-and-events/news/gdpr-legal-basis-and-why-it-matters/ For an open source project with an open source licence then an implict DCO applies for the meta data. It is the legal basis for the the release. If a corporate project has a closed source project, then yes, open publishing of that personal data within a repo's meta data would be incorrect, even though the internal repo would be kept. I already stressed that from the very beginning. Re copyright license: No, a copyright license does not provide a legitimization. - copyright is about distributing the program, not about distributing version control metadata. It is specificaly about giving that right to copy by Jane Doe (but git gives no other information other than that supposedly globally unique 'author email'. - Being named is a right, not an obligation of the author. Hence, if the author doesn't want his name published, the company doesn't have legitimate grounds based in copyright for doing it anyway, against his or her will. Git for Open Source is about open licencing by name. I'd agree that a closed corporate licence stays closed, but not forgotten. From a personal view, many folk want it to be that corporates (and open source organisations) should hold no personal information with having explicit permission that can then be withdrawn, with deletion to follow. However that 'legal' clause does [generally] win. Let's be honest: We do not know what legitimization exactly in each specific case the git metadata is being distributed under. We should know, already. A specific licence [or limit] should be in place. We don't really want to have to let a court decide ;-) It may be copyright, it may be employment, but it may also be revocable consent. This is, we cannot safely assume that no git user will ever have to deal with a legitimate request based on the right to be forgotten. The law is never decided by technical means, unfortunately. Regular git users should have no issues - they just need to point their finger at the responsible authority. (beware though, of the oneway trap door that the users mistakes can become the problem for the responsible authority!) In the git.git case (and linux.git) there is the DCO (to back up the GLP2) as an explicit requirement/certification that puts the information into the legal evidence category. IIUC almost all copyright ends up with a similar evidentail trail for the meta data. This makes things more complicated, not less. You have yet more meta data to cope with, yet more opportunities to be bitten by the right to be forgotten. Since I proposed a list of metadata where each entry can be anonymized independently of each other, it would be able to deal with this perfectly. The DCO/GPL2 are the legitimate data record that recipients should have for their copy. There is no right to be forgotten at that point. The more likely problem is if the content of the repo, rather than the meta data, is subject to GDPR, and that could easily ruin any storage method. Being able to mark an object as would help here(*). My proposal supports any part of the commit, including the contents of individual files, as eraseable, yet verifiable data. Also remember that most EU legislation is 'intent' based, rather than 'letter of', for the style of legal arguments (which is where some of the UK Brexit misunderstandings come from), so it is more than possible to get into the situation where an action is both mandated and illegal at the same time, so plent of snake oil salesman continue to sell magic fixes according to the customers local biases. This may be true. I am not trying to sell snake oil, however. To have erasure and verifiability at the same time is a highly generic feature that may be desirable to have for a multitude of reasons, including but not limited to legal ones like GDPR and copyright violations. I do not believe Git has anything to worry about that wasn't already an is
Re: GDPR compliance best practices?
correcting a negative /with/without/ and inserting a comma. - Original Message - From: "Philip Oakley" [snip] From a personal view, many folk want it to be that corporates (and open source organisations) should hold no personal information with having s/with/without/ explicit permission that can then be withdrawn, with deletion to follow. s/permission/permission,/ However that 'legal' clause does [generally] win.
Re: git glob pattern in .gitignore and git command
Hi Yubun, From: "Yubin Ruan" To ignore all .js file under a directory `lib', I can use "lib/**/js" to match them. But when using git command such as "git add", using "git add lib/\*.js" is sufficient. Why is this difference in glob mode? I have heard that there are many different glob mode out there (e.g., bash has many different glob mode). So, which classes of glob mode does these two belong to? Do they have a name? Is this a question about `git add` being able to add a file that is marked as being ignored in the .gitignore file? [Yes it can.] Or, is this simply about the many different globbing capabilities of one's shell, and of Git? The double asterix (star) is specific/local to Git. It is described in the various commands that use it, especially the gitignore man page `git help ignore` or https://git-scm.com/docs/gitignore. "Two consecutive asterisks ("**") in patterns matched against full pathname may have special meaning: ... " The single asterix does have two modes depending on how you quote it. It is described in the command line interface (cli) man page ` git help cli` or https://git-scm.com/docs/gitcli. "Many commands allow wildcards in paths, but you need to protect them from getting globbed by the shell. These two mean different things: ... " A common proper name for these asterix style characters is a "wildcards". Try 'bash wildcards' or linux wildcards' in your favourite search engine. -- Philip
Re: [PATCH v2] t/perf/run: Use proper "--get-regexp", not "--get-regex"
From: "Robert P. J. Day" On Sun, 3 Jun 2018, Thomas Gummerer wrote: > Subject: [PATCH v2] t/perf/run: Use proper "--get-regexp", not micronit: we prefer starting with a lowercase letter after the "area:" prefix in commit messages. Junio can probably fix that while queuing, so no need to resend. argh, i actually know that, i just screwed up. On 06/03, Robert P. J. Day wrote: > > Even though "--get-regex" appears to work with "git config", the > clear standard is to spell out the action in full. --get-regex works as the parse-option API allows abbreviations of the full option to be specified as long as the abbreviation is unambiguos. I don't know if this is documented anywhere other than 'Documentation/technical/api-parse-options.txt' though. it's in `git help cli`: many commands allow a long option --option to be abbreviated only to their unique prefix (e.g. if there is no other option whose name begins with opt, you may be able to spell --opt to invoke the --option flag), but you should fully spell them out when writing your scripts; It's a worthwile read, even if the man page isn't flagged up that often. > Signed-off-by: Robert P. J. Day > > --- It took me a bit to figure out why there is a v2, and what changed between the versions. This space after the '---' would be a good place to describe that to help reviewers. For others that are curious, it seems like the word "clear" was added in the commit message. The change itself looks good to me. the actual rationale for v2 was in the subject, i originally put just "get-regex" rather then "--get-regex"; i resubmitted for consistency. -- Philip
Re: GDPR compliance best practices?
From: "Peter Backes" On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote: I'm not trying to be selfish, I'm just trying to counter your literal reading of the law with a comment of "it'll depend". Just like there's a law against public urination in many places, but this is applied very differently to someone taking a piss in front of parliament v.s. someone taking a piss in the forest on a hike, even though the law itself usually makes no distinction about the two. We have huge companies using git now. This is not the tool used by a few kernel hackers anymore. In this example once you'd delete the UUID ref you don't have the UUID -> author mapping anymore (and b.t.w. that could be a many to one mapping). It is not relevant whether you have that mapping or not, it is enough that with additional information you could obtain it. For example, say, you have 5000 commits with the same UUID. Now your delete the mapping. But your friend still has it on his local copy. Now your friendly merely needs to tell you who is behind that UUID and instantly you can associate all 5000 commits with that person again. The GDPR is very explict about this, see recital 26. It says that pseudonymization is not enough, you need anonymization if you want to be free from regulation. In addition, and in contrast to my proposal, your solution doesn't allow verification of the author field. I think again that this is taking too much of a literalist view. The intent of that policy is to ensure that companies like Google can't just close down their EU offices weasel out of compliance be saying "we're just doing business from the US, it doesn't apply to us". It will not be used against anyone who's taking every reasonable precaution from doing business with EU customers. I think you are underestimating the political intention behind the GDPR. It has kind of an imperialist goal, to set international standards, to enforce them against foreign companies and to pressure other nations to establish the same standards. If I would read the GPDR in a literal sense, I would in fact come to the same conclusion as you: It's about companies doing substantial business in the EU. But the GDPR is carefully constructed in such a way that it is hard not to be affected by the GDPR in one way or another, and the obvious way to cope with that risk is to more or less obey the GDPR rules even if one does not have substantial business interests in the EU. What do you imagine that this is going to be like? That some EU citizen is going to walk into a small business in South America one day, which somehow is violating the GPDR, and when that business owner goes on holiday to the EU they're going to get detained? Not even the US policy against Cuba is anywhere remotely close to that. Well not if he's locally interacting with that business, a situation which I am sure is not regulated by the GDPR. However, if a large US website accepts users from the EU and uses the data gathered in conflict with the GDPR, perhaps selling it for use in political campaigns, and it gets several fines for this by EU authorities but ignores them and doesn't pay them, and the CEO one day takes a flight to Frankfurt to continue by train to Switzerland to get some cash from his bank account, then he will most likely not reach Swiss territory. -- Having been through corporate training and read up a number of the conflicting views in the press, one of the issues is that there are two viewpoints, one from each side of the fence. From a corporate/organisation viewpoint, it is best if every case of holding user information is for a legitimate purpose, which then means the company has 'protection' from requests for removal because the data *is* held legally/legitimately (which includes acting as evidence). In most Git cases that legal/legitimate purpose is the copyright licence, and/or corporate employment. That is, Jane wrote it, hence X has a legal rights of use, and we need to have a record of that (Jane wrote it) as evidence of that (I'm X, I can use it) right. That would mean that Jane cannot just ask to have that record removed and expect it to be removed. From a personal view, many folk want it to be that corporates (and open source organisations) should hold no personal information with having explicit permission that can then be withdrawn, with deletion to follow. However that 'legal' clause does [generally] win. In the git.git case (and linux.git) there is the DCO (to back up the GLP2) as an explicit requirement/certification that puts the information into the legal evidence category. IIUC almost all copyright ends up with a similar evidentail trail for the meta data. The more likely problem is if the content of the repo, rather than the meta data, is subject to GDPR, and that could easily ruin any storage method. Being able to mark an object as would help here(*). Also remember that most EU legislation is 'intent' based,
Re: git rebase -i --exec and changing directory
Hi Ondrej, Phillip, From: "Phillip Wood" <phillip.w...@talktalk.net> Hi Ondrej On 27/05/18 13:53, Ondrej Mosnáček wrote: Hi Philip, 2018-05-27 14:28 GMT+02:00 Philip Oakley <philipoak...@iee.org>: You may need to give a bit more background of things that seem obvious to you. So where is the src directory you are cd'ing to relative to the directory/repository you are creating? It is located in the top-level directory of the working tree (in the same directory that .git is in). From git-rebase(1): The "exec" command launches the command in a shell (the one specified in $SHELL, or the default shell if $SHELL is not set), so you can use shell features (like "cd", ">", ";" ...). The command is run from the root of the working tree. So I need to run 'cd src' if I want to run a command in there (regardless of the working directory of the git rebase command itself). What is [the name of] the directory you are currently in, etc. ? I don't think that is relevant here. FWIW, when verifying the problem I ran the reproducer from my original message in a directory whose path did not contain any spaces or special characters. Did you try to run the reproducing commands I posted? Did you get a different result? You should see the following in the output of 'cd dir && git status': At the time, I hadn't run the command. I was more interested in understanding the problem setup, as understanding often brings enlightenment. I was jsut starting to do my own setup and swaw Phillip had responsed which prompted me to think it could be that there was no tty attached to the exec, so output wasn't being seen (or something like that). I tried your recipe and got the same result as you. However I think it could be a problem with 'git status' rather than 'git rebase --exec'. If I run your recipe in /tmp/a and do cd dir GIT_DIR=/tmp/a/.git git status I get the same result as when running 'git status' from 'git rebase --exec' So I think the problem might have something to do with GIT_DIR being set in the environment when 'git status' is run I too got the same same results. I also tried duplicating the exec line and placing it before the pick line, just to check it wasn't an issue about termination. Same result. Best Wishes Phillip [...] Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) deleted:a deleted:b deleted:dir/x deleted:reproduce.sh Untracked files: (use "git add ..." to include in what will be committed) x [...] When I drop the 'cd dir && ' from before 'git status', the output is as expected: You are currently editing a commit while rebasing branch 'master' on '19765db'. (use "git commit --amend" to amend the current commit) (use "git rebase --continue" once you are satisfied with your changes) nothing to commit, working tree clean So I extended the command to be exec'd to `cd dir && ls && git status`, again with duplication of the exec, which then gives a bit more.. finally I extended the status to pipe it's output to a file, again duplicated. -- Philip@PhilipOakley MINGW32 /usr/src/mosnacek (master) $ git rebase -i --exec 'cd dir && ls && git status >stat.txt' base Executing: cd dir && ls && git status >stat0.txt x Executing: cd dir && ls && git status >stat.txt stat0.txt x Successfully rebased and updated refs/heads/master. -- the stat0, stat files can then be investigated. Summary: status is, I think, being clever and dropping the verbiage when not directly attached to the terminal. (or it is being intelligent and adding a lot more status details just because it _is_ within the rebase..) Philip -- From: "Ondrej Mosnáček" <omosna...@gmail.com> Bump? Has anyone had time to look at this? 2018-05-19 18:38 GMT+02:00 Ondrej Mosnáček <omosna...@gmail.com>: Hello, I am trying to run a script to edit multiple commits using 'git rebase -i --exec ...' and I ran into a strange behavior when I run 'cd' inside the --exec command and subsequently run a git command. For example, if the command is 'cd src && git status', then git status reports as if all files in the repository are deleted. What does that particular report look like? I see no special report of deletions, or additions. Example command sequence to reproduce the problem: # Setup: touch a mkdir dir touch dir/x git init . git add --all git commit -m commit1 git tag base touch b git add --all git commit -m commit2 # Here we go: git rebase -i --exec 'cd dir && git status' base # Spawning a s
Re: git rebase -i --exec and changing directory
Hi Ondrej, Phillip, From: "Phillip Wood" <phillip.w...@talktalk.net> Hi Ondrej On 27/05/18 13:53, Ondrej Mosnáček wrote: Hi Philip, 2018-05-27 14:28 GMT+02:00 Philip Oakley <philipoak...@iee.org>: You may need to give a bit more background of things that seem obvious to you. So where is the src directory you are cd'ing to relative to the directory/repository you are creating? It is located in the top-level directory of the working tree (in the same directory that .git is in). From git-rebase(1): The "exec" command launches the command in a shell (the one specified in $SHELL, or the default shell if $SHELL is not set), so you can use shell features (like "cd", ">", ";" ...). The command is run from the root of the working tree. So I need to run 'cd src' if I want to run a command in there (regardless of the working directory of the git rebase command itself). What is [the name of] the directory you are currently in, etc. ? I don't think that is relevant here. FWIW, when verifying the problem I ran the reproducer from my original message in a directory whose path did not contain any spaces or special characters. Did you try to run the reproducing commands I posted? Did you get a different result? You should see the following in the output of 'cd dir && git status': At the time, I hadn't run the command. I was more interested in understanding the problem setup, as understanding often brings enlightenment. I was jsut starting to do my own setup and swaw Phillip had responsed which prompted me to think it could be that there was no tty attached to the exec, so output wasn't being seen (or something like that). I tried your recipe and got the same result as you. However I think it could be a problem with 'git status' rather than 'git rebase --exec'. If I run your recipe in /tmp/a and do cd dir GIT_DIR=/tmp/a/.git git status I get the same result as when running 'git status' from 'git rebase --exec' So I think the problem might have something to do with GIT_DIR being set in the environment when 'git status' is run I too got the same same results. I also tried duplicating the exec line and placing it before the pick line, just to check it wasn't an issue about termination. Same result. Best Wishes Phillip [...] Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) deleted:a deleted:b deleted:dir/x deleted:reproduce.sh Untracked files: (use "git add ..." to include in what will be committed) x [...] When I drop the 'cd dir && ' from before 'git status', the output is as expected: You are currently editing a commit while rebasing branch 'master' on '19765db'. (use "git commit --amend" to amend the current commit) (use "git rebase --continue" once you are satisfied with your changes) nothing to commit, working tree clean So I extended the command to be exec'd to `cd dir && ls && git status`, again with duplication of the exec, which then gives a bit more.. finally I extended the status to pipe it's output to a file, again duplicated. -- Philip@PhilipOakley MINGW32 /usr/src/mosnacek (master) $ git rebase -i --exec 'cd dir && ls && git status >stat.txt' base Executing: cd dir && ls && git status >stat0.txt x Executing: cd dir && ls && git status >stat.txt stat0.txt x Successfully rebased and updated refs/heads/master. -- the stat0, stat files can then be investigated. Summary: status is, I think, being clever and dropping the verbiage when not directly attached to the terminal. (or it is being intelligent and adding a lot more status details just because it _is_ within the rebase..) Philip -- From: "Ondrej Mosnáček" <omosna...@gmail.com> Bump? Has anyone had time to look at this? 2018-05-19 18:38 GMT+02:00 Ondrej Mosnáček <omosna...@gmail.com>: Hello, I am trying to run a script to edit multiple commits using 'git rebase -i --exec ...' and I ran into a strange behavior when I run 'cd' inside the --exec command and subsequently run a git command. For example, if the command is 'cd src && git status', then git status reports as if all files in the repository are deleted. What does that particular report look like? I see no special report of deletions, or additions. Example command sequence to reproduce the problem: # Setup: touch a mkdir dir touch dir/x git init . git add --all git commit -m commit1 git tag base touch b git add --all git commit -m commit2 # Here we go: git rebase -i --exec 'cd dir && git status' base # Spawning a s
Re: git rebase -i --exec and changing directory
You may need to give a bit more background of things that seem obvious to you. So where is the src directory you are cd'ing to relative to the directory/repository you are creating? What is [the name of] the directory you are currently in, etc. ? Philip -- From: "Ondrej Mosnáček"Bump? Has anyone had time to look at this? 2018-05-19 18:38 GMT+02:00 Ondrej Mosnáček : Hello, I am trying to run a script to edit multiple commits using 'git rebase -i --exec ...' and I ran into a strange behavior when I run 'cd' inside the --exec command and subsequently run a git command. For example, if the command is 'cd src && git status', then git status reports as if all files in the repository are deleted. Example command sequence to reproduce the problem: # Setup: touch a mkdir dir touch dir/x git init . git add --all git commit -m commit1 git tag base touch b git add --all git commit -m commit2 # Here we go: git rebase -i --exec 'cd dir && git status' base # Spawning a sub-shell doesn't help: git rebase -i --exec '(cd dir && git status)' base Is this expected behavior or did I found a bug? Is there any workaround, other than cd'ing to the toplevel directory every time I want to run a git command when I am inside a subdirectory? $ git --version git version 2.17.0 Thanks, Ondrej Mosnacek
Re: Troubles with picking an editor during Git update
Hi Bartosz, From: "Bartosz Konikiewicz"Hi there! I had an issue with Git installer for Windows while trying to update The Git for Windows package is managed, via https://gitforwindows.org/, as a separate application, based on Git. my instance of the software. My previous version was "git version 2.15.1.windows.2", while my operating system prompted me to upgrade to "2.17.0". The installer asked me to "choose the default editor for Git". One of these options was Notepad++ - my editor of choice. Vim was selected by default and I've picked Notepad++ from a drop-down list. As soon as I did it, a "next" button greyed out. When I moved back to the previous step and then forward to the editor choice, the "Notepad++" option was still highlighted, and the "next" button wasn't greyed out anymore - it was active and I was able to press it and continue installation. Steps to reproduce: 1. Have Notepad++ 6.6.9 installed on Windows 10 64-bit 10.0.17134 Build 17134. 2. Use an installer for version 2.17.0 to upgrade from version 2.15.1. 3. On an editor selection screen, choose Notepad++ instead of Vim. You should be unable to continue installation because of the "next" button being disabled. 4. Press "prev". 5. Press "next". Notepad++ should be still highlighted, and the "next" button should be active, allowing to continue installation. I find it to be a crafty trick to make me use Vim. I have considered it for a good moment. The best place to report the issue, and perhaps contribure is via the 'GfW' Issue tracker https://github.com/git-for-windows/git/issues. Building Git for Windows via the SDK has become even easier with recent updates, so it should be relativley easy to spot the offending line in the installer and perhaps even propose a PR (Pull Request) to fix the issue. regards Philip
Re: Re: [PATCH 1/3] checkout.c: add strict usage of -- before file_path
From: "Dannier Castro L"On 13/05/2018 00:03, Duy Nguyen wrote: On Sun, May 13, 2018 at 4:23 AM, Dannier Castro L wrote: For GIT new users, this complicated versatility of could be very confused, also considering that actually the flag '--' is completely useless (added or not, there is not any difference for this command), when the same program messages promote the use of this flag. I would like an option to revert back to current behavior. I'm not a new user. I know what I'm doing. Please don't make me type more. And '--" is not completely useless. If you have and with the same name, you have to give "--" to to tell git what the first argument means. Sure Duy, you're right, probably "completely useless" is not the correct definition, even according with the code I didn't find another useful case that is not file and branch with the same name. The program is able to know the type using only the name, turning "--" into an extra flag in most of cases. I think this solution could please you more: By default the configuration is the current, but the user has the chance to set this, for example: git config --global flag.strictdashdash true Thank you so much for the spent time reviewing the patch, this is my first one in this repository. It maybe that after review you could suggest an appropriate rewording or re-arrangement of the man page to better highlight the proper use of the '--' disambiguation. Perhaps frame the man page as if it is normal for the '--' to be included within command lines (which should be the case for scripts anyway!). Then indicate that it isn't mandatory if the file/branch/dwim distinction is obvious. i.e. make sure that the man page is educational as well as being a reference that may be misunderstood. Those well versed in the Git cli will normally omit the '--', only using it where necessary, however for a new users/readers of the man page, it may be better to be more explicit and avoid future misunderstandings. -- Philip
Re: [PATCH v6 11/13] command-list.txt: documentation and guide line
Hi Duy, From: "Nguyễn Thái Ngọc Duy": Monday, May 07, 2018 This is intended to help anybody who needs to update command-list.txt. It gives a brief introduction of all attributes a command can take. --- command-list.txt | 44 1 file changed, 44 insertions(+) diff --git a/command-list.txt b/command-list.txt index 99ddc231c1..9c70c69193 100644 --- a/command-list.txt +++ b/command-list.txt @@ -1,3 +1,47 @@ +# Command classification list +# --- +# All supported commands, builtin or external, must be described in +# here. This info is used to list commands in various places. Each +# command is on one line followed by one or more attributes. +# +# The first attribute group is mandatory and indicates the command +# type. This group includes: +# +# mainporcelain +# ancillarymanipulators +# ancillaryinterrogators +# foreignscminterface +# plumbingmanipulators +# plumbinginterrogators +# synchingrepositories +# synchelpers +# purehelpers +# +# The type names are self explanatory. But if you want to see what +# command belongs to what group to get a better picture, have a look +# at "git" man page, "GIT COMMANDS" section. +# +# Commands of type mainporcelain can also optionally have one of these +# attributes: +# +# init +# worktree +# info +# history +# remote +# +# These commands are considered "common" and will show up in "git +# help" output in groups. Uncommon porcelain commands must not +# specify any of these attributes. +# +# "complete" attribute is used to mark that the command should be +# completable by git-completion.bash. Note that by default, +# mainporcelain commands are completable so you don't need this +# attribute. +# +# While not true commands, guides are also specified here, which can +# only have "guide" attribute and nothing else. While the file is called ~ "Command List", the list is here as a support to the Help function, and ultimately to the user's reading of the man pages, including the man(5/7) guides, so I'd view the man page guides as first class citizens. Perhaps: # As part of the Git man page list, the man(5/7) guides are also specified # here, which can only have "guide" attribute and nothing else. -- Philip +# ### command list (do not change this line, also do not change alignment) # command name category [category] [category] git-add mainporcelain worktree -- 2.17.0.705.g3525833791
Re: [PATCH 11/18] branch-diff: add tests
From: "Johannes Schindelin"From: Thomas Rast These are essentially lifted from https://github.com/trast/tbdiff, with light touch-ups to account for the new command name. Apart from renaming `tbdiff` to `branch-diff`, only one test case needed to be adjusted: 11 - 'changed message'. The underlying reason it had to be adjusted is that diff generation is sometimes ambiguous. In this case, a comment line and an empty line are added, but it is ambiguous whether they were added after the existing empty line, or whether an empty line and the comment line are added *before* the existing emtpy line. And apparently xdiff picks a different s/emtpy/empty/ option here than Python's difflib. Signed-off-by: Johannes Schindelin [...] Philip
Re: [PATCH 0/4] subtree: move out of contrib
From: "Ævar Arnfjörð Bjarmason"I think at this point git-subtree is widely used enough to move out of contrib/, maybe others disagree, but patches are always better for discussion that patch-less ML posts. Assuming this lands in Git, then there will also need to be a simple follow on into Duy's series that is updating the command-list.txt (Message-Id: <20180429181844.21325-10-pclo...@gmail.com>). Duy's series also does the completions thing IIUC;-). -- Philip Ævar Arnfjörð Bjarmason (4): git-subtree: move from contrib/subtree/ subtree: remove support for git version <1.7 subtree: fix a test failure under GETTEXT_POISON i18n: translate the git-subtree command .gitignore| 1 + Documentation/git-submodule.txt | 2 +- .../subtree => Documentation}/git-subtree.txt | 3 + Makefile | 1 + contrib/subtree/.gitignore| 7 - contrib/subtree/COPYING | 339 -- contrib/subtree/INSTALL | 28 -- contrib/subtree/Makefile | 97 - contrib/subtree/README| 8 - contrib/subtree/t/Makefile| 86 - contrib/subtree/todo | 48 --- .../subtree/git-subtree.sh => git-subtree.sh | 109 +++--- {contrib/subtree/t => t}/t7900-subtree.sh | 21 +- 13 files changed, 78 insertions(+), 672 deletions(-) rename {contrib/subtree => Documentation}/git-subtree.txt (99%) delete mode 100644 contrib/subtree/.gitignore delete mode 100644 contrib/subtree/COPYING delete mode 100644 contrib/subtree/INSTALL delete mode 100644 contrib/subtree/Makefile delete mode 100644 contrib/subtree/README delete mode 100644 contrib/subtree/t/Makefile delete mode 100644 contrib/subtree/todo rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%) rename {contrib/subtree/t => t}/t7900-subtree.sh (99%) -- 2.17.0.290.gded63e768a
Re: Branch deletion question / possible bug?
From: "Jacob Keller"On Fri, Apr 27, 2018 at 5:29 PM, Tang (US), Pik S wrote: Hi, I discovered that I was able to delete the feature branch I was in, due to some fat fingering on my part and case insensitivity. I never realized this could be done before. A quick google search did not give me a whole lot to work with... Steps to reproduce: 1. Create a feature branch, "editCss" 2. git checkout master 3. git checkout editCSS 4. git checkout editCss 5. git branch -d editCSS Are you running on a case-insensitive file system? What version of git? I thought I recalled seeing commits to help avoid creating branches of the same name with separate case when we know we're on a file system which is case-insensitive.. Normally, it should have been impossible for a user to delete the branch they're on. And the deletion left me in a weird state that took a while to dig out of. I know this was a user error, but I was also wondering if this was a bug. If we have not yet done this, I think we should. Long term this would be fixed by using a separate format to store refs than the filesystem, which has a few projects being worked on but none have been put into a release. Yes, this is an on-going problem on Windows and other case insentive systems. At the moment the branch name becomes embedded as a file name, so when Git requests details of a branch from the filesystem, it can get a case insensitive equivalent. Meanwhile, internally Git is checking for equality in a case sensitive [Linux] way with obvious consequences such as this - The most obvious being when there is no "*" current branch marker in the branch status list. It's a bit tricky to fix (internally the name and the path are passed down different call chains), and depends on how one expects the case insensitivity to work - the kicker is when someone does an edit of the name via the file system and expects Git to cope (i.e. devs knowing, or think they know, too much detail ;-). The refs can also get packed, so the "bad spelling" gets baked in. Ultimately it probably means that GfW and other systems will need a case sensitivity check when opening paths... Philip Thanks, Jake Thanks, Pik Tang
Re: [PATCH v6 11/11] Remove obsolete script to convert grafts to replace refs
From: "Johannes Schindelin"The functionality is now implemented as `git replace --convert-graft-file`. A rather late in the day thought: Should this go through the same deprecation dance? I.e. replace the body of the script with the new `git replace --convert-graft-file` and echo (or die!) a warning message that this script is now deprecated and will be removed? At least it will catch those who arrive via random web advice! -- Philip Signed-off-by: Johannes Schindelin --- contrib/convert-grafts-to-replace-refs.sh | 28 --- 1 file changed, 28 deletions(-) delete mode 100755 contrib/convert-grafts-to-replace-refs.sh diff --git a/contrib/convert-grafts-to-replace-refs.sh b/contrib/convert-grafts-to-replace-refs.sh deleted file mode 100755 index 0cbc917b8cf..000 --- a/contrib/convert-grafts-to-replace-refs.sh +++ /dev/null @@ -1,28 +0,0 @@ -#!/bin/sh - -# You should execute this script in the repository where you -# want to convert grafts to replace refs. - -GRAFTS_FILE="${GIT_DIR:-.git}/info/grafts" - -. $(git --exec-path)/git-sh-setup - -test -f "$GRAFTS_FILE" || die "Could not find graft file: '$GRAFTS_FILE'" - -grep '^[^# ]' "$GRAFTS_FILE" | -while read definition -do - if test -n "$definition" - then - echo "Converting: $definition" - git replace --graft $definition || - die "Conversion failed for: $definition" - fi -done - -mv "$GRAFTS_FILE" "$GRAFTS_FILE.bak" || - die "Could not rename '$GRAFTS_FILE' to '$GRAFTS_FILE.bak'" - -echo "Success!" -echo "All the grafts in '$GRAFTS_FILE' have been converted to replace refs!" -echo "The grafts file '$GRAFTS_FILE' has been renamed: '$GRAFTS_FILE.bak'" -- 2.17.0.windows.1.33.gfcbb1fa0445
Re: [PATCH v3 09/11] technical/shallow: describe the relationship with replace refs
Hi dscho From: "Johannes Schindelin" <johannes.schinde...@gmx.de> : Tuesday, April 24, 2018 8:10 PM On Sun, 22 Apr 2018, Philip Oakley wrote: From: "Johannes Schindelin" <johannes.schinde...@gmx.de> > Now that grafts are deprecated, we should start to assume that readers > have no idea what grafts are. So it makes more sense to describe the > "shallow" feature in terms of replace refs. Here we say we should drop the term "grafts" > > Suggested-by: Eric Sunshine <sunsh...@sunshineco.com> > Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de> > --- > Documentation/technical/shallow.txt | 19 +++ > 1 file changed, 11 insertions(+), 8 deletions(-) > > diff --git a/Documentation/technical/shallow.txt > b/Documentation/technical/shallow.txt > index 5183b154229..b3ff23c25f6 100644 > --- a/Documentation/technical/shallow.txt > +++ b/Documentation/technical/shallow.txt > @@ -9,14 +9,17 @@ these commits have no parents. > * > > The basic idea is to write the SHA-1s of shallow commits into > -$GIT_DIR/shallow, and handle its contents like the contents > -of $GIT_DIR/info/grafts (with the difference that shallow > -cannot contain parent information). > - > -This information is stored in a new file instead of grafts, or > -even the config, since the user should not touch that file > -at all (even throughout development of the shallow clone, it > -was never manually edited!). > +$GIT_DIR/shallow, and handle its contents similar to replace > +refs (with the difference that shallow does not actually > +create those replace refs) and If grafts are deprecated, why not alse get rid of this mention and simply leave the 'what it does' part. Internally, shallow commits are implemented using the graft code path, and however the change here is just to the documentation, independent of th code path's name. they always will be: we will always need a list of the shallow commits, and we will always need to be able to lift the "shallow" attribute quickly, when deepening a shallow clone. So it makes sense to mention that here, because we are deep in technical details in Documentation/technical/. > very much like the > deprecated > +graft file (with I was looking to snip this 'graft' reference, as per the commit message.. > the difference that shallow commits will > +always have their parents grafted away, not replaced by s/their parents grafted away/no parents/ (rather than being replaced..) Then I botched this substitution But the commits will typically have parents. So they really will have their parents grafted away as long as they are marked "shallow"... OK, maybe I mis-used the figurative 'no parents', when it means the literal 'parents not present'. Perhaps something like: +$GIT_DIR/shallow, and handle its contents similar to replace +refs (with the difference that shallow does not actually +create those replace refs) with the difference that shallow commits will +always have their parents not present. -- Philip
Re: [PATCH v8 06/16] sequencer: introduce the `merge` command
From: "Johannes Schindelin" <johannes.schinde...@gmx.de> On Mon, 23 Apr 2018, Philip Oakley wrote: From: "Johannes Schindelin" <johannes.schinde...@gmx.de> : Monday, April 23, 2018 1:03 PM Subject: Re: [PATCH v8 06/16] sequencer: introduce the `merge` command [...] > > > > label onto > > > > > > # Branch abc > > > reset onto > > > > Is this reset strictly necessary. We are already there @head. > > No, this is not strictly necessary, but I've realised my misunderstanding. I was thinking this (and others) was equivalent to $ git reset <thatHead'onto'> # maybe even --hard, i.e. affecting the worktree Oh, but it *is* affecting the worktree. In this case, since we label HEAD and then immediately reset to the label, there is just nothing to change. Consider this example, though: label onto # Branch: from-philip reset onto pick abcdef something label from-philip # Branch: with-love reset onto pick 012345 else label with-love reset onto merge -C 98765 from-philip merge -C 43210 with-love Only in the first instance is the `reset onto` a no-op, an incidental one. After picking `something` and labeling the result as `from-philip`, though, the next `reset onto` really resets the worktree. rather that just being a movement of the Head rev (though I may be having brain fade here regarding untracked files etc..) The current way of doing things does not allow the `reset` to overwrite untracked, nor ignored files (I think, I only verified the former, not the latter). But yeah, it is not just a movement of HEAD. It does reset the worktree, although quite a bit more gently (and safely) than `git reset --hard`. In that respect, this patch series is a drastic improvement over the Git garden shears (which is the shell script I use in Git for Windows which inspired this here patch series). thanks for clarifying. Yes my reasoning was a total brain fade ... Along with the fact that it's a soft/safe/gentle reset. -- Philip
Re: [PATCH v8 06/16] sequencer: introduce the `merge` command
From: "Johannes Schindelin": Monday, April 23, 2018 1:03 PM Subject: Re: [PATCH v8 06/16] sequencer: introduce the `merge` command Hi Philip, [...] > label onto > > # Branch abc > reset onto Is this reset strictly necessary. We are already there @head. No, this is not strictly necessary, but I've realised my misunderstanding. I was thinking this (and others) was equivalent to $ git reset # maybe even --hard, i.e. affecting the worktree rather that just being a movement of the Head rev (though I may be having brain fade here regarding untracked files etc..) - it makes it easier to auto-generate (otherwise you would have to keep track of the "current HEAD" while generating that todo list, and - if I keep the `reset onto` there, then it is *a lot* easier to reorder topic branches. Ciao, Dscho Thanks Philip
Re: [PATCH v8 06/16] sequencer: introduce the `merge` command
From: "Johannes Schindelin"This patch is part of the effort to reimplement `--preserve-merges` with a substantially improved design, a design that has been developed in the Git for Windows project to maintain the dozens of Windows-specific patch series on top of upstream Git. The previous patch implemented the `label` and `reset` commands to label The previous patch was [Patch 05/16] git-rebase--interactive: clarify arguments, so this statement doesn't appear to be true. Has a patch been missed or re-ordered? Or should it be simply "This patch implements" ? Likewise the patch subject would be updated. commits and to reset to labeled commits. This patch adds the `merge` s/adds/also adds/ ? command, with the following syntax: merge [-C ] # The parameter in this instance is the *original* merge commit, whose author and message will be used for the merge commit that is about to be created. The parameter refers to the (possibly rewritten) revision to merge. Let's see an example of a todo list: The example ought to also note that `label onto` is to `# label current HEAD with a name`, seeing as this is the first occurance. It may be obvious in retrospect, but not at first reading. label onto # Branch abc reset onto Is this reset strictly necessary. We are already there @head. pick deadbeef Hello, world! label abc reset onto pick cafecafe And now for something completely different merge -C baaabaaa abc # Merge the branch 'abc' into master To edit the merge commit's message (a "reword" for merges, if you will), use `-c` (lower-case) instead of `-C`; this convention was borrowed from `git commit` that also supports `-c` and `-C` with similar meanings. To create *new* merges, i.e. without copying the commit message from an existing commit, simply omit the `-C ` parameter (which will open an editor for the merge message): merge abc This comes in handy when splitting a branch into two or more branches. Note: this patch only adds support for recursive merges, to keep things simple. Support for octopus merges will be added later in a separate patch series, support for merges using strategies other than the recursive merge is left for the future. Signed-off-by: Johannes Schindelin --- git-rebase--interactive.sh | 6 + sequencer.c| 407 - 2 files changed, 406 insertions(+), 7 deletions(-) diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index e1b865f43f2..ccd5254d1c9 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -162,6 +162,12 @@ s, squash = use commit, but meld into previous commit f, fixup = like \"squash\", but discard this commit's log message x, exec = run command (the rest of the line) using shell d, drop = remove commit +l, label = label current HEAD with a name +t, reset = reset HEAD to a label +m, merge [-C | -c ] [# ] +. create a merge commit using the original merge commit's +. message (or the oneline, if no original merge commit was +. specified). Use -c to reword the commit message. These lines can be re-ordered; they are executed from top to bottom. " | git stripspace --comment-lines >>"$todo" diff --git a/sequencer.c b/sequencer.c index 01443e0f245..35fcacbdf0f 100644 --- a/sequencer.c +++ b/sequencer.c @@ -23,6 +23,8 @@ #include "hashmap.h" #include "notes-utils.h" #include "sigchain.h" +#include "unpack-trees.h" +#include "worktree.h" #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" @@ -120,6 +122,13 @@ static GIT_PATH_FUNC(rebase_path_stopped_sha, "rebase-merge/stopped-sha") static GIT_PATH_FUNC(rebase_path_rewritten_list, "rebase-merge/rewritten-list") static GIT_PATH_FUNC(rebase_path_rewritten_pending, "rebase-merge/rewritten-pending") + +/* + * The path of the file listing refs that need to be deleted after the rebase + * finishes. This is used by the `label` command to record the need for cleanup. + */ +static GIT_PATH_FUNC(rebase_path_refs_to_delete, "rebase-merge/refs-to-delete") + /* * The following files are written by git-rebase just after parsing the * command-line (and are only consumed, not modified, by the sequencer). @@ -244,18 +253,34 @@ static const char *gpg_sign_opt_quoted(struct replay_opts *opts) int sequencer_remove_state(struct replay_opts *opts) { - struct strbuf dir = STRBUF_INIT; + struct strbuf buf = STRBUF_INIT; int i; + if (is_rebase_i(opts) && + strbuf_read_file(, rebase_path_refs_to_delete(), 0) > 0) { + char *p = buf.buf; + while (*p) { + char *eol = strchr(p, '\n'); + if (eol) + *eol = '\0'; + if (delete_ref("(rebase -i) cleanup", p, NULL, 0) < 0) + warning(_("could not delete '%s'"), p); + if (!eol) + break; + p = eol + 1; + } + } + free(opts->gpg_sign); free(opts->strategy); for (i = 0; i < opts->xopts_nr; i++) free(opts->xopts[i]); free(opts->xopts); - strbuf_addstr(, get_dir(opts)); - remove_dir_recursively(, 0); -
Re: [PATCH 3/3] Avoid multiple PREFIX definitions
From: "Johannes Schindelin" <johannes.schinde...@gmx.de> From: Philip Oakley <philipoak...@iee.org> The short and sweet PREFIX can be confused when used in many places. Rename both usages to better describe their purpose. EXEC_CMD_PREFIX is used in full to disambiguate it from the nearby GIT_EXEC_PATH. @dcsho; Thanks for keeping up with this and all your work. LGTM Philip. The PREFIX in sideband.c, while nominally independant of the exec_cmd PREFIX, does reside within libgit[1], so the definitions would clash when taken together with a PREFIX given on the command line for use by exec_cmd.c. Noticed when compiling Git for Windows using MSVC/Visual Studio [1] which reports the conflict beteeen the command line definition and the definition in sideband.c within the libgit project. [1] the libgit functions are brought into a single sub-project within the Visual Studio construction script provided in contrib, and hence uses a single command for both exec_cmd.c and sideband.c. Signed-off-by: Philip Oakley <philipoak...@iee.org> Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de> --- Makefile | 2 +- exec-cmd.c | 4 ++-- sideband.c | 10 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/Makefile b/Makefile index 111e93d3bea..49cec672242 100644 --- a/Makefile +++ b/Makefile @@ -2271,7 +2271,7 @@ exec-cmd.sp exec-cmd.s exec-cmd.o: EXTRA_CPPFLAGS = \ '-DGIT_EXEC_PATH="$(gitexecdir_SQ)"' \ '-DGIT_LOCALE_PATH="$(localedir_relative_SQ)"' \ '-DBINDIR="$(bindir_relative_SQ)"' \ - '-DPREFIX="$(prefix_SQ)"' + '-DFALLBACK_RUNTIME_PREFIX="$(prefix_SQ)"' builtin/init-db.sp builtin/init-db.s builtin/init-db.o: GIT-PREFIX builtin/init-db.sp builtin/init-db.s builtin/init-db.o: EXTRA_CPPFLAGS = \ diff --git a/exec-cmd.c b/exec-cmd.c index 3b0a039083a..02d31ee8971 100644 --- a/exec-cmd.c +++ b/exec-cmd.c @@ -48,7 +48,7 @@ static const char *system_prefix(void) !(prefix = strip_path_suffix(executable_dirname, GIT_EXEC_PATH)) && !(prefix = strip_path_suffix(executable_dirname, BINDIR)) && !(prefix = strip_path_suffix(executable_dirname, "git"))) { - prefix = PREFIX; + prefix = FALLBACK_RUNTIME_PREFIX; trace_printf("RUNTIME_PREFIX requested, " "but prefix computation failed. " "Using static fallback '%s'.\n", prefix); @@ -243,7 +243,7 @@ void git_resolve_executable_dir(const char *argv0) */ static const char *system_prefix(void) { - return PREFIX; + return FALLBACK_RUNTIME_PREFIX; } /* diff --git a/sideband.c b/sideband.c index 6d7f943e438..325bf0e974a 100644 --- a/sideband.c +++ b/sideband.c @@ -13,7 +13,7 @@ * the remote died unexpectedly. A flush() concludes the stream. */ -#define PREFIX "remote: " +#define DISPLAY_PREFIX "remote: " #define ANSI_SUFFIX "\033[K" #define DUMB_SUFFIX "" @@ -49,7 +49,7 @@ int recv_sideband(const char *me, int in_stream, int out) switch (band) { case 3: strbuf_addf(, "%s%s%s", outbuf.len ? "\n" : "", - PREFIX, buf + 1); + DISPLAY_PREFIX, buf + 1); retval = SIDEBAND_REMOTE_ERROR; break; case 2: @@ -67,7 +67,7 @@ int recv_sideband(const char *me, int in_stream, int out) int linelen = brk - b; if (!outbuf.len) - strbuf_addstr(, PREFIX); + strbuf_addstr(, DISPLAY_PREFIX); if (linelen > 0) { strbuf_addf(, "%.*s%s%c", linelen, b, suffix, *brk); @@ -81,8 +81,8 @@ int recv_sideband(const char *me, int in_stream, int out) } if (*b) - strbuf_addf(, "%s%s", - outbuf.len ? "" : PREFIX, b); + strbuf_addf(, "%s%s", outbuf.len ? + "" : DISPLAY_PREFIX, b); break; case 1: write_or_die(out, buf + 1, len); -- 2.17.0.windows.1.15.gaa56ade3205
Re: [PATCH v3 09/11] technical/shallow: describe the relationship with replace refs
From: "Johannes Schindelin"Now that grafts are deprecated, we should start to assume that readers have no idea what grafts are. So it makes more sense to describe the "shallow" feature in terms of replace refs. Suggested-by: Eric Sunshine Signed-off-by: Johannes Schindelin --- Documentation/technical/shallow.txt | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt index 5183b154229..b3ff23c25f6 100644 --- a/Documentation/technical/shallow.txt +++ b/Documentation/technical/shallow.txt @@ -9,14 +9,17 @@ these commits have no parents. * The basic idea is to write the SHA-1s of shallow commits into -$GIT_DIR/shallow, and handle its contents like the contents -of $GIT_DIR/info/grafts (with the difference that shallow -cannot contain parent information). - -This information is stored in a new file instead of grafts, or -even the config, since the user should not touch that file -at all (even throughout development of the shallow clone, it -was never manually edited!). +$GIT_DIR/shallow, and handle its contents similar to replace +refs (with the difference that shallow does not actually +create those replace refs) and If grafts are deprecated, why not alse get rid of this mention and simply leave the 'what it does' part. very much like the deprecated +graft file (with the difference that shallow commits will +always have their parents grafted away, not replaced by s/their parents grafted away/no parents/ (rather than being replaced..) +different parents). + +This information is stored in a special-purpose file because the +user should not touch that file at all (even throughout +development of the shallow clone, it was never manually +edited!). Each line contains exactly one SHA-1. When read, a commit_graft will be constructed, which has nr_parent < 0 to make it easier -- 2.17.0.windows.1.15.gaa56ade3205
Re: [PATCH v8 09/16] rebase: introduce the --rebase-merges option
From: "Johannes Schindelin"Once upon a time, this here developer thought: wouldn't it be nice if, say, Git for Windows' patches on top of core Git could be represented as a thicket of branches, and be rebased on top of core Git in order to maintain a cherry-pick'able set of patch series? The original attempt to answer this was: git rebase --preserve-merges. However, that experiment was never intended as an interactive option, and it only piggy-backed on git rebase --interactive because that command's implementation looked already very, very familiar: it was designed by the same person who designed --preserve-merges: yours truly. Some time later, some other developer (I am looking at you, Andreas! ;-)) decided that it would be a good idea to allow --preserve-merges to be combined with --interactive (with caveats!) and the Git maintainer (well, the interim Git maintainer during Junio's absence, that is) agreed, and that is when the glamor of the --preserve-merges design started to fall apart rather quickly and unglamorously. The reason? In --preserve-merges mode, the parents of a merge commit (or for that matter, of *any* commit) were not stated explicitly, but were *implied* by the commit name passed to the `pick` command. This made it impossible, for example, to reorder commits. Not to mention to flatten the branch topology or, deity forbid, to split topic branches Aside: The idea of a "flattened" topology is, to my mind, not actually defined though may be understood by devs working in the area. Hopefully it's going away as a term, though the new 'cousins' will need clarification (there's no dot notation for that area of topology). into two. Alas, these shortcomings also prevented that mode (whose original purpose was to serve Git for Windows' needs, with the additional hope that it may be useful to others, too) from serving Git for Windows' needs. Five years later, when it became really untenable to have one unwieldy, big hodge-podge patch series of partly related, partly unrelated patches in Git for Windows that was rebased onto core Git's tags from time to time (earning the undeserved wrath of the developer of the ill-fated git-remote-hg series that first obsoleted Git for Windows' competing approach, only to be abandoned without maintainer later) was really untenable, the "Git garden shears" were born [*1*/*2*]: a script, piggy-backing on top of the interactive rebase, that would first determine the branch topology of the patches to be rebased, create a pseudo todo list for further editing, transform the result into a real todo list (making heavy use of the `exec` command to "implement" the missing todo list commands) and finally recreate the patch series on top of the new base commit. That was in 2013. And it took about three weeks to come up with the design and implement it as an out-of-tree script. Needless to say, the implementation needed quite a few years to stabilize, all the while the design itself proved itself sound. With this patch, the goodness of the Git garden shears comes to `git rebase -i` itself. Passing the `--rebase-merges` option will generate a todo list that can be understood readily, and where it is obvious how to reorder commits. New branches can be introduced by inserting `label` commands and calling `merge `. And once this mode will have become stable and universally accepted, we can deprecate the design mistake that was `--preserve-merges`. Link *1*: https://github.com/msysgit/msysgit/blob/master/share/msysGit/shears.sh Link *2*: https://github.com/git-for-windows/build-extra/blob/master/shears.sh Signed-off-by: Johannes Schindelin --- Documentation/git-rebase.txt | 20 ++- contrib/completion/git-completion.bash | 2 +- git-rebase--interactive.sh | 1 + git-rebase.sh | 6 + t/t3430-rebase-merges.sh | 179 + 5 files changed, 206 insertions(+), 2 deletions(-) create mode 100755 t/t3430-rebase-merges.sh diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt index 3277ca14327..34e0f6a69c1 100644 --- a/Documentation/git-rebase.txt +++ b/Documentation/git-rebase.txt @@ -378,6 +378,23 @@ The commit list format can be changed by setting the configuration option rebase.instructionFormat. A customized instruction format will automatically have the long commit hash prepended to the format. +-r:: +--rebase-merges:: + By default, a rebase will simply drop merge commits and only rebase + the non-merge commits. With this option, it will try to preserve + the branching structure within the commits that are to be rebased, + by recreating the merge commits. If a merge commit resolved any merge + or contained manual amendments, then they will have to be re-applied + manually. ++ +This mode is similar in spirit to `--preserve-merges`, but in contrast to +that option works well in interactive rebases: commits can
Re: [PATCH v8 09/16] rebase: introduce the --rebase-merges option
From: "Johannes Schindelin"Once upon a time, this here developer thought: wouldn't it be nice if, say, Git for Windows' patches on top of core Git could be represented as a thicket of branches, and be rebased on top of core Git in order to maintain a cherry-pick'able set of patch series? The original attempt to answer this was: git rebase --preserve-merges. However, that experiment was never intended as an interactive option, and it only piggy-backed on git rebase --interactive because that command's implementation looked already very, very familiar: it was designed by the same person who designed --preserve-merges: yours truly. Some time later, some other developer (I am looking at you, Andreas! ;-)) decided that it would be a good idea to allow --preserve-merges to be combined with --interactive (with caveats!) and the Git maintainer (well, the interim Git maintainer during Junio's absence, that is) agreed, and that is when the glamor of the --preserve-merges design started to fall apart rather quickly and unglamorously. The reason? In --preserve-merges mode, the parents of a merge commit (or for that matter, of *any* commit) were not stated explicitly, but were *implied* by the commit name passed to the `pick` command. Aside: I think this para should be extracted to the --preserve-merges documentation to highlight what it does / why it is 'wrong' (not what would be expected in some case). It may also need to discuss the (figurative) Cousins vs. Siblings distinction [merge of branches external, or internal, to the rebase. "In --preserve-merges, the commit being selected for merging is implied by the commit name passed to the `pick` command (i.e. of the original merge commit), not that of the rebased version of that parent." A similar issue occurs with (figuratively) '--ancestry-path --first parent' searches which lacks the alternate '--lead parent' post-walk selection. [1]. I don't think there is a dot notation to select the merge cousins, nor merge siblings either A.,B ? (that's dot-comma ;-) This made it impossible, for example, to reorder commits. Not to mention to flatten the branch topology or, deity forbid, to split topic branches into two. Alas, these shortcomings also prevented that mode (whose original purpose was to serve Git for Windows' needs, with the additional hope that it may be useful to others, too) from serving Git for Windows' needs. Five years later, when it became really untenable to have one unwieldy, big hodge-podge patch series of partly related, partly unrelated patches in Git for Windows that was rebased onto core Git's tags from time to time (earning the undeserved wrath of the developer of the ill-fated git-remote-hg series that first obsoleted Git for Windows' competing approach, only to be abandoned without maintainer later) was really untenable, the "Git garden shears" were born [*1*/*2*]: a script, piggy-backing on top of the interactive rebase, that would first determine the branch topology of the patches to be rebased, create a pseudo todo list for further editing, transform the result into a real todo list (making heavy use of the `exec` command to "implement" the missing todo list commands) and finally recreate the patch series on top of the new base commit. That was in 2013. And it took about three weeks to come up with the design and implement it as an out-of-tree script. Needless to say, the implementation needed quite a few years to stabilize, all the while the design itself proved itself sound. With this patch, the goodness of the Git garden shears comes to `git rebase -i` itself. Passing the `--rebase-merges` option will generate a todo list that can be understood readily, and where it is obvious how to reorder commits. New branches can be introduced by inserting `label` commands and calling `merge `. And once this mode will have become stable and universally accepted, we can deprecate the design mistake that was `--preserve-merges`. Link *1*: https://github.com/msysgit/msysgit/blob/master/share/msysGit/shears.sh Link *2*: https://github.com/git-for-windows/build-extra/blob/master/shears.sh Signed-off-by: Johannes Schindelin --- Documentation/git-rebase.txt | 20 ++- contrib/completion/git-completion.bash | 2 +- git-rebase--interactive.sh | 1 + git-rebase.sh | 6 + t/t3430-rebase-merges.sh | 179 + 5 files changed, 206 insertions(+), 2 deletions(-) create mode 100755 t/t3430-rebase-merges.sh diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt index 3277ca14327..34e0f6a69c1 100644 --- a/Documentation/git-rebase.txt +++ b/Documentation/git-rebase.txt @@ -378,6 +378,23 @@ The commit list format can be changed by setting the configuration option rebase.instructionFormat. A customized instruction format will automatically have the long commit hash prepended to the
Re: [PATCH v8 08/16] rebase-helper --make-script: introduce a flag to rebase merges
From: "Johannes Schindelin"Sorry for the very late in the series comments.. The sequencer just learned new commands intended to recreate branch structure (similar in spirit to --preserve-merges, but with a substantially less-broken design). Let's allow the rebase--helper to generate todo lists making use of these commands, triggered by the new --rebase-merges option. For a commit topology like this (where the HEAD points to C): - A - B - C \ / D the generated todo list would look like this: # branch D pick 0123 A label branch-point pick 1234 D label D reset branch-point pick 2345 B merge -C 3456 D # C To keep things simple, we first only implement support for merge commits with exactly two parents, leaving support for octopus merges to a later patch series. For the first time reader this (below) isn't as obvious as may be thought. maybe we should be a little more explicit here. As a special, hard-coded label, all merge-rebasing todo lists start with the command `label onto` .. which labels the start point head with the name 'onto' ... Maybe even: "All merge-rebasing todo lists start with, as a convenience, a hard-coded `label onto` line which will label the start point's head" ... so that we can later always refer to the revision onto which everything is rebased. Signed-off-by: Johannes Schindelin --- builtin/rebase--helper.c | 4 +- sequencer.c | 351 ++- sequencer.h | 1 + 3 files changed, 353 insertions(+), 3 deletions(-) diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c index ad074705bb5..781782e7272 100644 --- a/builtin/rebase--helper.c +++ b/builtin/rebase--helper.c @@ -12,7 +12,7 @@ static const char * const builtin_rebase_helper_usage[] = { int cmd_rebase__helper(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; - unsigned flags = 0, keep_empty = 0; + unsigned flags = 0, keep_empty = 0, rebase_merges = 0; int abbreviate_commands = 0; enum { CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, @@ -24,6 +24,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "keep-empty", _empty, N_("keep empty commits")), OPT_BOOL(0, "allow-empty-message", _empty_message, N_("allow commits with empty messages")), + OPT_BOOL(0, "rebase-merges", _merges, N_("rebase merge commits")), OPT_CMDMODE(0, "continue", , N_("continue rebase"), CONTINUE), OPT_CMDMODE(0, "abort", , N_("abort rebase"), @@ -57,6 +58,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) flags |= keep_empty ? TODO_LIST_KEEP_EMPTY : 0; flags |= abbreviate_commands ? TODO_LIST_ABBREVIATE_CMDS : 0; + flags |= rebase_merges ? TODO_LIST_REBASE_MERGES : 0; flags |= command == SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0; if (command == CONTINUE && argc == 1) diff --git a/sequencer.c b/sequencer.c index 5944d3a34eb..1e17a11ca32 100644 --- a/sequencer.c +++ b/sequencer.c @@ -25,6 +25,8 @@ #include "sigchain.h" #include "unpack-trees.h" #include "worktree.h" +#include "oidmap.h" +#include "oidset.h" #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" @@ -3436,6 +3438,343 @@ void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag) strbuf_release(); } +struct labels_entry { + struct hashmap_entry entry; + char label[FLEX_ARRAY]; +}; + +static int labels_cmp(const void *fndata, const struct labels_entry *a, + const struct labels_entry *b, const void *key) +{ + return key ? strcmp(a->label, key) : strcmp(a->label, b->label); +} + +struct string_entry { + struct oidmap_entry entry; + char string[FLEX_ARRAY]; +}; + +struct label_state { + struct oidmap commit2label; + struct hashmap labels; + struct strbuf buf; +}; + +static const char *label_oid(struct object_id *oid, const char *label, + struct label_state *state) +{ + struct labels_entry *labels_entry; + struct string_entry *string_entry; + struct object_id dummy; + size_t len; + int i; + + string_entry = oidmap_get(>commit2label, oid); + if (string_entry) + return string_entry->string; + + /* + * For "uninteresting" commits, i.e. commits that are not to be + * rebased, and which can therefore not be labeled, we use a unique + * abbreviation of the commit name. This is slightly more complicated + * than calling find_unique_abbrev() because we also need to make + * sure that the abbreviation does not conflict with any other + * label. + * + * We disallow "interesting" commits to be labeled by a string that + * is a valid full-length hash, to ensure that we always can find an + * abbreviation for any uninteresting commit's names that does not + * clash with any other label. + */ + if (!label) { + char *p; + + strbuf_reset(>buf); + strbuf_grow(>buf, GIT_SHA1_HEXSZ); + label = p = state->buf.buf; + + find_unique_abbrev_r(p, oid, default_abbrev); + + /* + * We may need to extend the
Re: [PATCH v8 06/16] sequencer: introduce the `merge` command
From: "Johannes Schindelin"This patch is part of the effort to reimplement `--preserve-merges` with a substantially improved design, a design that has been developed in the Git for Windows project to maintain the dozens of Windows-specific patch series on top of upstream Git. The previous patch implemented the `label` and `reset` commands to label The previous patch was [Patch 05/16] git-rebase--interactive: clarify arguments, so this statement doesn't appear to be true. Has a patch been missed or re-ordered? Or should it be simply "This patch implements" ? Likewise the patch subject would be updated. commits and to reset to labeled commits. This patch adds the `merge` s/adds/also adds/ ? command, with the following syntax: merge [-C ] # The parameter in this instance is the *original* merge commit, whose author and message will be used for the merge commit that is about to be created. The parameter refers to the (possibly rewritten) revision to merge. Let's see an example of a todo list: The example ought to also note that `label onto` is to `# label current HEAD with a name`, seeing as this is the first occurance. It may be obvious in retrospect, but not at first reading. label onto # Branch abc reset onto Is this reset strictly necessary. We are already there @head. pick deadbeef Hello, world! label abc reset onto pick cafecafe And now for something completely different merge -C baaabaaa abc # Merge the branch 'abc' into master To edit the merge commit's message (a "reword" for merges, if you will), use `-c` (lower-case) instead of `-C`; this convention was borrowed from `git commit` that also supports `-c` and `-C` with similar meanings. To create *new* merges, i.e. without copying the commit message from an existing commit, simply omit the `-C ` parameter (which will open an editor for the merge message): merge abc This comes in handy when splitting a branch into two or more branches. Note: this patch only adds support for recursive merges, to keep things simple. Support for octopus merges will be added later in a separate patch series, support for merges using strategies other than the recursive merge is left for the future. Signed-off-by: Johannes Schindelin --- git-rebase--interactive.sh | 6 + sequencer.c| 407 - 2 files changed, 406 insertions(+), 7 deletions(-) diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index e1b865f43f2..ccd5254d1c9 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -162,6 +162,12 @@ s, squash = use commit, but meld into previous commit f, fixup = like \"squash\", but discard this commit's log message x, exec = run command (the rest of the line) using shell d, drop = remove commit +l, label = label current HEAD with a name +t, reset = reset HEAD to a label +m, merge [-C | -c ] [# ] +. create a merge commit using the original merge commit's +. message (or the oneline, if no original merge commit was +. specified). Use -c to reword the commit message. These lines can be re-ordered; they are executed from top to bottom. " | git stripspace --comment-lines >>"$todo" diff --git a/sequencer.c b/sequencer.c index 01443e0f245..35fcacbdf0f 100644 --- a/sequencer.c +++ b/sequencer.c @@ -23,6 +23,8 @@ #include "hashmap.h" #include "notes-utils.h" #include "sigchain.h" +#include "unpack-trees.h" +#include "worktree.h" #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" @@ -120,6 +122,13 @@ static GIT_PATH_FUNC(rebase_path_stopped_sha, "rebase-merge/stopped-sha") static GIT_PATH_FUNC(rebase_path_rewritten_list, "rebase-merge/rewritten-list") static GIT_PATH_FUNC(rebase_path_rewritten_pending, "rebase-merge/rewritten-pending") + +/* + * The path of the file listing refs that need to be deleted after the rebase + * finishes. This is used by the `label` command to record the need for cleanup. + */ +static GIT_PATH_FUNC(rebase_path_refs_to_delete, "rebase-merge/refs-to-delete") + /* * The following files are written by git-rebase just after parsing the * command-line (and are only consumed, not modified, by the sequencer). @@ -244,18 +253,34 @@ static const char *gpg_sign_opt_quoted(struct replay_opts *opts) int sequencer_remove_state(struct replay_opts *opts) { - struct strbuf dir = STRBUF_INIT; + struct strbuf buf = STRBUF_INIT; int i; + if (is_rebase_i(opts) && + strbuf_read_file(, rebase_path_refs_to_delete(), 0) > 0) { + char *p = buf.buf; + while (*p) { + char *eol = strchr(p, '\n'); + if (eol) + *eol = '\0'; + if (delete_ref("(rebase -i) cleanup", p, NULL, 0) < 0) + warning(_("could not delete '%s'"), p); + if (!eol) + break; + p = eol + 1; + } + } + free(opts->gpg_sign); free(opts->strategy); for (i = 0; i < opts->xopts_nr; i++) free(opts->xopts[i]); free(opts->xopts); - strbuf_addstr(, get_dir(opts)); - remove_dir_recursively(, 0); - strbuf_release(); +
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
From: "Duy Nguyen" <pclo...@gmail.com> On Wed, Apr 18, 2018 at 12:47 AM, Philip Oakley <philipoak...@iee.org> wrote: > Is that something I should add to my todo to add a 'guide' category > > etc.? I added it too [1]. Not sure if you want anything more on top though. What I've seen is looking good - I've not had as much time as I'd like.. I'm not sure of the status of the git/generate-cmdlist.sh though. Should that also be updated, or did I miss that? Yes it's updated by other patches in the same thread. -- Thanks. Hopefully I'll have some time this weekend/coming week as my wife is away Philip
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
From: "Philip Oakley" <philipoak...@iee.org> : Tuesday, April 17, 2018 11:47 PM From: "Duy Nguyen" <pclo...@gmail.com> : Tuesday, April 17, 2018 5:48 PM On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote: On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley <philipoak...@iee.org> wrote: > From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 > PM > >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley >> <philipoak...@iee.org> >> wrote: >>> >>> I'm only just catching up, but does/can this series also capture the >>> non-command guides that are available in git so that the 'git >>> help -g' >>> can >>> begin to list them all? >> >> >> It currently does not. But I don't see why it should not. This should >> allow git.txt to list all the guides too, for people who skip "git >> help" and go hard core mode with "man git". Thanks for bringing this >> up. >> -- >> Duy >> > Is that something I should add to my todo to add a 'guide' category > etc.? I added it too [1]. Not sure if you want anything more on top though. What I've seen is looking good - I've not had as much time as I'd like.. I'm not sure of the status of the git/generate-cmdlist.sh though. Should that also be updated, or did I miss that? -- Philip I may be miss-remembering the order that the `git help` determines the list of commands and guides. There was at least one place where the list of commands was generated programatically that I may be confused with (I've not had time to delve into the code :-( -- The "anything more" that at least I had in mind was something like this. Though I'm not sure if it's a good thing to replace a hand crafted section with an automatedly generated one. This patch on top combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of documents/guides are extracted from command-list.txt -- 8< -- diff --git a/Documentation/Makefile b/Documentation/Makefile index 6232143cb9..3e0ecd2e11 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) build-docdep.perl cmds_txt = cmds-ancillaryinterrogators.txt \ cmds-ancillarymanipulators.txt \ + cmds-guide.txt \ cmds-mainporcelain.txt \ cmds-plumbinginterrogators.txt \ cmds-plumbingmanipulators.txt \ diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index 5aa73cfe45..e158bd9b96 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -54,6 +54,7 @@ for (sort <>) { for my $cat (qw(ancillaryinterrogators ancillarymanipulators + guide mainporcelain plumbinginterrogators plumbingmanipulators diff --git a/Documentation/git.txt b/Documentation/git.txt index 4767860e72..d60d2ae0c7 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -808,29 +808,6 @@ The index is also capable of storing multiple entries (called "stages") for a given pathname. These stages are used to hold the various unmerged version of a file when a merge is in progress. -FURTHER DOCUMENTATION -- - -See the references in the "description" section to get started -using Git. The following is probably more detail than necessary -for a first-time user. - -The link:user-manual.html#git-concepts[Git concepts chapter of the -user-manual] and linkgit:gitcore-tutorial[7] both provide -introductions to the underlying Git architecture. - -See linkgit:gitworkflows[7] for an overview of recommended workflows. - -See also the link:howto-index.html[howto] documents for some useful -examples. - -The internals are documented in the -link:technical/api-index.html[Git API documentation]. - -Users migrating from CVS may also want to -read linkgit:gitcvs-migration[7]. - - Authors --- Git was started by Linus Torvalds, and is currently maintained by Junio @@ -854,11 +831,16 @@ the Git Security mailing list <git-secur...@googlegroups.com>. SEE ALSO -linkgit:gittutorial[7], linkgit:gittutorial-2[7], -linkgit:giteveryday[7], linkgit:gitcvs-migration[7], -linkgit:gitglossary[7], linkgit:gitcore-tutorial[7], -linkgit:gitcli[7], link:user-manual.html[The Git User's Manual], -linkgit:gitworkflows[7] + +See the references in the "description" section to get started +using Git. The following is probably more detail than necessary +for a first-time user. + +include::cmds-guide.txt[] + +See also the link:howto-index.html[howto] documents for some useful +examples. The internals are documented in the +link:technical/api-index.html[Git API documentation]. GIT --- diff --git a/command-list.txt b/command-list.txt index 1835f1a928..f26b8acd52 100644 --- a/command-list.txt +++ b/command-list.txt @@ -150,10 +150,14 @@ git-whatchanged
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
From: "Duy Nguyen" <pclo...@gmail.com> : Tuesday, April 17, 2018 5:48 PM On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote: On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley <philipoak...@iee.org> wrote: > From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 > PM > >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley <philipoak...@iee.org> >> wrote: >>> >>> I'm only just catching up, but does/can this series also capture the >>> non-command guides that are available in git so that the 'git >>> help -g' >>> can >>> begin to list them all? >> >> >> It currently does not. But I don't see why it should not. This should >> allow git.txt to list all the guides too, for people who skip "git >> help" and go hard core mode with "man git". Thanks for bringing this >> up. >> -- >> Duy >> > Is that something I should add to my todo to add a 'guide' category > etc.? I added it too [1]. Not sure if you want anything more on top though. What I've seen is looking good - I've not had as much time as I'd like.. I'm not sure of the status of the git/generate-cmdlist.sh though. Should that also be updated, or did I miss that? -- Philip The "anything more" that at least I had in mind was something like this. Though I'm not sure if it's a good thing to replace a hand crafted section with an automatedly generated one. This patch on top combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of documents/guides are extracted from command-list.txt -- 8< -- diff --git a/Documentation/Makefile b/Documentation/Makefile index 6232143cb9..3e0ecd2e11 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) build-docdep.perl cmds_txt = cmds-ancillaryinterrogators.txt \ cmds-ancillarymanipulators.txt \ + cmds-guide.txt \ cmds-mainporcelain.txt \ cmds-plumbinginterrogators.txt \ cmds-plumbingmanipulators.txt \ diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index 5aa73cfe45..e158bd9b96 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -54,6 +54,7 @@ for (sort <>) { for my $cat (qw(ancillaryinterrogators ancillarymanipulators + guide mainporcelain plumbinginterrogators plumbingmanipulators diff --git a/Documentation/git.txt b/Documentation/git.txt index 4767860e72..d60d2ae0c7 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -808,29 +808,6 @@ The index is also capable of storing multiple entries (called "stages") for a given pathname. These stages are used to hold the various unmerged version of a file when a merge is in progress. -FURTHER DOCUMENTATION -- - -See the references in the "description" section to get started -using Git. The following is probably more detail than necessary -for a first-time user. - -The link:user-manual.html#git-concepts[Git concepts chapter of the -user-manual] and linkgit:gitcore-tutorial[7] both provide -introductions to the underlying Git architecture. - -See linkgit:gitworkflows[7] for an overview of recommended workflows. - -See also the link:howto-index.html[howto] documents for some useful -examples. - -The internals are documented in the -link:technical/api-index.html[Git API documentation]. - -Users migrating from CVS may also want to -read linkgit:gitcvs-migration[7]. - - Authors --- Git was started by Linus Torvalds, and is currently maintained by Junio @@ -854,11 +831,16 @@ the Git Security mailing list <git-secur...@googlegroups.com>. SEE ALSO -linkgit:gittutorial[7], linkgit:gittutorial-2[7], -linkgit:giteveryday[7], linkgit:gitcvs-migration[7], -linkgit:gitglossary[7], linkgit:gitcore-tutorial[7], -linkgit:gitcli[7], link:user-manual.html[The Git User's Manual], -linkgit:gitworkflows[7] + +See the references in the "description" section to get started +using Git. The following is probably more detail than necessary +for a first-time user. + +include::cmds-guide.txt[] + +See also the link:howto-index.html[howto] documents for some useful +examples. The internals are documented in the +link:technical/api-index.html[Git API documentation]. GIT --- diff --git a/command-list.txt b/command-list.txt index 1835f1a928..f26b8acd52 100644 --- a/command-list.txt +++ b/command-list.txt @@ -150,10 +150,14 @@ git-whatchanged ancillaryinterrogators git-worktreemainporcelain git-write-tree plumbingmanipulators gitattributes guide +gitcvs-migrationguide +gitcli guide +gitcore-tutorialguide giteveryday guide gi
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 PM On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley <philipoak...@iee.org> wrote: I'm only just catching up, but does/can this series also capture the non-command guides that are available in git so that the 'git help -g' can begin to list them all? It currently does not. But I don't see why it should not. This should allow git.txt to list all the guides too, for people who skip "git help" and go hard core mode with "man git". Thanks for bringing this up. -- Duy Is that something I should add to my todo to add a 'guide' category etc.? A quick search of public-inbox suggests https://public-inbox.org/git/1361660761-1932-1-git-send-email-philipoak...@iee.org/ as being where I first made the suggestions, but it got trimmed back to not update (be embedded in) the command-list.txt Philip
Re: [PATCH v6 04/15] sequencer: introduce new commands to reset the revision
From: "Phillip Wood": Friday, April 13, 2018 11:03 AM If a label or reset command fails it is likely to be due to a typo. Rescheduling the command would make it easier for the user to fix the problem as they can just run 'git rebase --edit-todo'. Is this worth noting in the command documentation? "If the label or reset command fails then fix the problem by runnning 'git rebase --edit-todo'." ? Just a thought. It also ensures that the problem has actually been fixed when the rebase continues. I think you could do it like this -- Philip (also @dunelm, 73-79..)
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
From: "Eric Sunshine"Monday, April 09, 2018 6:17 AM On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy wrote: This is pretty rough but I'd like to see how people feel about this first. I notice we have two places for command classification. One in command-list.txt, one in __git_list_porcelain_commands() in git-completion.bash. People who are following nd/parseopt-completion probably know that I'm try to reduce duplication in this script as much as possible, this is another step towards that. By keeping all information of command-list.txt in git binary, we could provide the porcelain list to git-completion.bash via "git --list-cmds=porcelain", so we don't neeed a separate command classification in git-completion.bash anymore. I like the direction this series is taking. Because we have all command synopsis as a side effect, we could now support "git help -a --verbose" which prints something like "git help", a command name and a description, but we could do it for _all_ recognized commands. This could help people look for a command even if we don't provide "git appropos". Nice idea, and you practically get this for free (aside from the the obvious new code) since generate-cmdlist.sh already plucks the summary for each command directly from Documentation/git-*.txt. I'm only just catching up, but does/can this series also capture the non-command guides that are available in git so that the 'git help -g' can begin to list them all? It was something I looked at some years ago (when I added the -g option) but at the time the idea of updating the command-list.txt was too invasive. Just a thought. Philip
Re: Bug: duplicate sections in .git/config after remote removal
From: "Ævar Arnfjörð Bjarmason"On Tue, Mar 27 2018, Jason Frey wrote: While the impact of this bug is minimal, and git itself is not affected, it can affect external tools that want to read the .git/config file, expecting unique section names. To reproduce: Given the following example .git/config file (I am leaving out the [core] section for brevity): [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master Running `git remote rm origin` will result in the following contents: [branch "master"] Running `git remote add origin g...@github.com:Fryguy/example.git` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* And finally, running `git fetch origin; git branch -u origin/master` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master at which point you can see the duplicate sections (even though one is empty). Also note that if you do the steps again, you will be left with 3 sections, 2 of which are empty. This process can be repeated over and over. This can be annoying and result in some very verbose config files when we automatically edit them, e.g.: (rm -v /tmp/test.ini; for i in {1..3}; do git config -f /tmp/test.ini foo.bar 0 && git config -f /tmp/test.ini --unset foo.bar; done; cat /tmp/test.ini) removed '/tmp/test.ini' [foo] [foo] [foo] But it's not so clear that it should be called a bug, yes we could be a bit smarter and not add obvious crap like the example above (duplicate sections at the end), but it gets less obvious in more complex cases, see my c8b2cec09e ("branch: add test for -m renaming multiple config sections", 2017-06-18) for one such example. Git has a config format that's hybrid human/machine editable. Consider a case like: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status Now, if I run `git config gc.auto 0` is it better if we end up with: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false auto = 0 ;; Our aliases [alias] st = status Or something that makes it more clear that a machine added something at the end: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status [gc] auto = 0 Most importantly though, regardless of what we decide to do when we machine-edit the file, it's also human-editable, and being able to repeat sections is part of our config format that you're simply going to have to deal with. One option may be to create a simple 'lint' style checker that simply hiughlights and suggests options so the user can decide for themselves what they need to do. This would help span the gap between hard format and the soft format capabiulities of machine readable ini files, the Git config reader and being human readable. Thus duplicate sections would be noted, likewise the presence of comments immediately preceding a section header, or terminating a section (with or without spacing?), etc.Such a config_lint could reside in the contrib as a supprt tool, and may in the long term be a guide to a common format. However, as noted, it would be more of a long term aspiration.. The external tool (presumably some generic *.ini parser) you're trying to point at git's config is broken for that purpose if it doesn't handle duplicate sections. You're probably better off trying to parse `git config --list --null` than trying to make it work. I don't think we'd ever want to get rid of this feature, it's *very* useful. Both for config via the include macro, and for people to manually paste some config they want to try out to the end of their config, without having to manually edit it to incorporate it into their already existing sections. -- Philip
Re: Bug: duplicate sections in .git/config after remote removal
From: "Ævar Arnfjörð Bjarmason"On Tue, Mar 27 2018, Jason Frey wrote: While the impact of this bug is minimal, and git itself is not affected, it can affect external tools that want to read the .git/config file, expecting unique section names. To reproduce: Given the following example .git/config file (I am leaving out the [core] section for brevity): [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master Running `git remote rm origin` will result in the following contents: [branch "master"] Running `git remote add origin g...@github.com:Fryguy/example.git` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* And finally, running `git fetch origin; git branch -u origin/master` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master at which point you can see the duplicate sections (even though one is empty). Also note that if you do the steps again, you will be left with 3 sections, 2 of which are empty. This process can be repeated over and over. This can be annoying and result in some very verbose config files when we automatically edit them, e.g.: (rm -v /tmp/test.ini; for i in {1..3}; do git config -f /tmp/test.ini foo.bar 0 && git config -f /tmp/test.ini --unset foo.bar; done; cat /tmp/test.ini) removed '/tmp/test.ini' [foo] [foo] [foo] But it's not so clear that it should be called a bug, yes we could be a bit smarter and not add obvious crap like the example above (duplicate sections at the end), but it gets less obvious in more complex cases, see my c8b2cec09e ("branch: add test for -m renaming multiple config sections", 2017-06-18) for one such example. Git has a config format that's hybrid human/machine editable. Consider a case like: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status Now, if I run `git config gc.auto 0` is it better if we end up with: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false auto = 0 ;; Our aliases [alias] st = status Or something that makes it more clear that a machine added something at the end: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status [gc] auto = 0 Most importantly though, regardless of what we decide to do when we machine-edit the file, it's also human-editable, and being able to repeat sections is part of our config format that you're simply going to have to deal with. One option may be to create a simple 'lint' style checker that simply hiughlights and suggests options so the user can decide for themselves what they need to do. This would help span the gap between hard format and the soft format capabiulities of machine readable ini files, the Git config reader and being human readable. Thus duplicate sections would be noted, likewise the presence of comments immediately preceding a section header, or terminating a section (with or without spacing?), etc.Such a config_lint could reside in the contrib as a supprt tool, and may in the long term be a guide to a common format. However, as noted, it would be more of a long term aspiration.. The external tool (presumably some generic *.ini parser) you're trying to point at git's config is broken for that purpose if it doesn't handle duplicate sections. You're probably better off trying to parse `git config --list --null` than trying to make it work. I don't think we'd ever want to get rid of this feature, it's *very* useful. Both for config via the include macro, and for people to manually paste some config they want to try out to the end of their config, without having to manually edit it to incorporate it into their already existing sections. -- Philip
Re: Bug: duplicate sections in .git/config after remote removal
From: "Ævar Arnfjörð Bjarmason"On Tue, Mar 27 2018, Jason Frey wrote: While the impact of this bug is minimal, and git itself is not affected, it can affect external tools that want to read the .git/config file, expecting unique section names. To reproduce: Given the following example .git/config file (I am leaving out the [core] section for brevity): [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master Running `git remote rm origin` will result in the following contents: [branch "master"] Running `git remote add origin g...@github.com:Fryguy/example.git` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* And finally, running `git fetch origin; git branch -u origin/master` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master at which point you can see the duplicate sections (even though one is empty). Also note that if you do the steps again, you will be left with 3 sections, 2 of which are empty. This process can be repeated over and over. This can be annoying and result in some very verbose config files when we automatically edit them, e.g.: (rm -v /tmp/test.ini; for i in {1..3}; do git config -f /tmp/test.ini foo.bar 0 && git config -f /tmp/test.ini --unset foo.bar; done; cat /tmp/test.ini) removed '/tmp/test.ini' [foo] [foo] [foo] But it's not so clear that it should be called a bug, yes we could be a bit smarter and not add obvious crap like the example above (duplicate sections at the end), but it gets less obvious in more complex cases, see my c8b2cec09e ("branch: add test for -m renaming multiple config sections", 2017-06-18) for one such example. Git has a config format that's hybrid human/machine editable. Consider a case like: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status Now, if I run `git config gc.auto 0` is it better if we end up with: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false auto = 0 ;; Our aliases [alias] st = status Or something that makes it more clear that a machine added something at the end: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status [gc] auto = 0 Most importantly though, regardless of what we decide to do when we machine-edit the file, it's also human-editable, and being able to repeat sections is part of our config format that you're simply going to have to deal with. One option may be to create a simple 'lint' style checker that simply hiughlights and suggests options so the user can decide for themselves what they need to do. This would help span the gap between hard format and the soft format capabiulities of machine readable ini files, the Git config reader and being human readable. Thus duplicate sections would be noted, likewise the presence of comments immediately preceding a section header, or terminating a section (with or without spacing?), etc.Such a config_lint could reside in the contrib as a supprt tool, and may in the long term be a guide to a common format. However, as noted, it would be more of a long term aspiration.. The external tool (presumably some generic *.ini parser) you're trying to point at git's config is broken for that purpose if it doesn't handle duplicate sections. You're probably better off trying to parse `git config --list --null` than trying to make it work. I don't think we'd ever want to get rid of this feature, it's *very* useful. Both for config via the include macro, and for people to manually paste some config they want to try out to the end of their config, without having to manually edit it to incorporate it into their already existing sections. -- Philip
Re: Bug: duplicate sections in .git/config after remote removal
From: "Ævar Arnfjörð Bjarmason"On Tue, Mar 27 2018, Jason Frey wrote: While the impact of this bug is minimal, and git itself is not affected, it can affect external tools that want to read the .git/config file, expecting unique section names. To reproduce: Given the following example .git/config file (I am leaving out the [core] section for brevity): [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master Running `git remote rm origin` will result in the following contents: [branch "master"] Running `git remote add origin g...@github.com:Fryguy/example.git` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* And finally, running `git fetch origin; git branch -u origin/master` will result in the following contents: [branch "master"] [remote "origin"] url = g...@github.com:Fryguy/example.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master at which point you can see the duplicate sections (even though one is empty). Also note that if you do the steps again, you will be left with 3 sections, 2 of which are empty. This process can be repeated over and over. This can be annoying and result in some very verbose config files when we automatically edit them, e.g.: (rm -v /tmp/test.ini; for i in {1..3}; do git config -f /tmp/test.ini foo.bar 0 && git config -f /tmp/test.ini --unset foo.bar; done; cat /tmp/test.ini) removed '/tmp/test.ini' [foo] [foo] [foo] But it's not so clear that it should be called a bug, yes we could be a bit smarter and not add obvious crap like the example above (duplicate sections at the end), but it gets less obvious in more complex cases, see my c8b2cec09e ("branch: add test for -m renaming multiple config sections", 2017-06-18) for one such example. Git has a config format that's hybrid human/machine editable. Consider a case like: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status Now, if I run `git config gc.auto 0` is it better if we end up with: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false auto = 0 ;; Our aliases [alias] st = status Or something that makes it more clear that a machine added something at the end: [gc] ;; Here's all the gc config we set up to avoid the great outage of 2015 autoDetach = false ;; Our aliases [alias] st = status [gc] auto = 0 Most importantly though, regardless of what we decide to do when we machine-edit the file, it's also human-editable, and being able to repeat sections is part of our config format that you're simply going to have to deal with. One option may be to create a simple 'lint' style checker that simply hiughlights and suggests options so the user can decide for themselves what they need to do. This would help span the gap between hard format and the soft format capabiulities of machine readable ini files, the Git config reader and being human readable. Thus duplicate sections would be noted, likewise the presence of comments immediately preceding a section header, or terminating a section (with or without spacing?), etc.Such a config_lint could reside in the contrib as a supprt tool, and may in the long term be a guide to a common format. However, as noted, it would be more of a long term aspiration.. The external tool (presumably some generic *.ini parser) you're trying to point at git's config is broken for that purpose if it doesn't handle duplicate sections. You're probably better off trying to parse `git config --list --null` than trying to make it work. I don't think we'd ever want to get rid of this feature, it's *very* useful. Both for config via the include macro, and for people to manually paste some config they want to try out to the end of their config, without having to manually edit it to incorporate it into their already existing sections. -- Philip
Re: [ANNOUNCE] Git Rev News edition 37
From: "Christian Couder"Hi everyone, The 37th edition of Git Rev News is now published: https://git.github.io/rev_news/2018/03/21/edition-37/ Thanks a lot to all the contributors! Enjoy, Christian, Jakub, Markus and Gabriel. Thank you for the Git Rev News. I've been off-line for 5 weeks, so seeing the newsletter is great. Next is to peruse Junio's "What's Cooking" lists. Thanks to all. Philip
Re: Crash when clone includes magic filenames on Windows
From: "Philip Oakley" <philipoak...@iee.org> From: "Jeffrey Walton" <noloa...@gmail.com> Hi Everyone, I'm seeing this issue on Windows: https://pastebin.com/YfB25E4T . It seems the filename AUX is the culprit. Also see https://blogs.msdn.microsoft.com/oldnewthing/20031022-00/?p=42073 . (Thanks to Milleneumbug on Stack Overflow). I did not name the file, someone else did. I doubt the filename will be changed. Searching is not turning up much information: https://www.google.com/search?q=git+"magic+filenames"+windows Does anyone know how to sidestep the issue on Windows? Jeff This comes up on the Git-for-Windows (GfW) issues fairly often https://github.com/git-for-windows/git/issues. The fetch part of the clone is sucessful, but the final checkout step fails when the AUX (or any other prohibited filename - that's proper cabkward compatibility for you) is to be checked out then the file system (FS) refuses and the checkout 'fails. You do however have the full repo locally. The trick is probably then to set up a sparse checkout so the AUX is never included on the FS. However it is an open 'up-for-grabs' project to add such a check in GfW. Philip One option maybe to extend the $GIT_DIR/info/sparse-checkout capability and add a specific $GIT_DIR/info/never-sparse-checkout file that could carry the complement (files & dirs) options that are platform applicable (no AUX, no COM1, no colons, etc.;-), so that it does not conflict with the users' regular sparse checkout selection in $GIT_DIR/info/sparse-checkout. It's probably easier to understand that way. -- Philip
Re: Crash when clone includes magic filenames on Windows
From: "Jeffrey Walton"Hi Everyone, I'm seeing this issue on Windows: https://pastebin.com/YfB25E4T . It seems the filename AUX is the culprit. Also see https://blogs.msdn.microsoft.com/oldnewthing/20031022-00/?p=42073 . (Thanks to Milleneumbug on Stack Overflow). I did not name the file, someone else did. I doubt the filename will be changed. Searching is not turning up much information: https://www.google.com/search?q=git+"magic+filenames"+windows Does anyone know how to sidestep the issue on Windows? Jeff This comes up on the Git-for-Windows (GfW) issues fairly often https://github.com/git-for-windows/git/issues. The fetch part of the clone is sucessful, but the final checkout step fails when the AUX (or any other prohibited filename - that's proper cabkward compatibility for you) is to be checked out then the file system (FS) refuses and the checkout 'fails. You do however have the full repo locally. The trick is probably then to set up a sparse checkout so the AUX is never included on the FS. However it is an open 'up-for-grabs' project to add such a check in GfW. Philip
Re: "git bisect run make" adequate to locate first unbuildable commit?
From: "Robert P. J. Day" <rpj...@crashcourse.ca> On Fri, 9 Feb 2018, Philip Oakley, CEng MIET wrote: (apologies for using the fancy letters after the name ID...) From: "Robert P. J. Day" <rpj...@crashcourse.ca> > > writing a short tutorial on "git bisect" and, all the details of > special exit code 125 aside, if one wanted to locate the first > unbuildable commit, would it be sufficient to just run? > > $ git bisect run make > > as i read it, make returns either 0, 1 or 2 so there doesn't appear > to be any possibility of weirdness with clashing with a 125 exit code. > am i overlooking some subtle detail here i should be aware of? thanks. > > rday In the spirit of pedanticism, one should also clarify the word "first", in that it's not a linear search for _an_ unbuildable commit, but that one is looking for the transition between an unbroken sequence of unbuildable commits, which transitions to buildable commits, and its the transition that is sought. (there could be many random unbuildable commits within a sequence in some folks' processes!) quite so, i should have been more precise. rday The other two things that may be happening (in the wider bisect discussion) that I've heard of are: 1. there may be feature branches that bypass the known good starting commit, which can cause understanding issues as those side branches that predate the start point are also considered potential bu commits. 2. if you just want the first parent check for the bad commit point, that mark the second parents of merges as being good. Also, I'd expect that the skipped commits aren't 'counted' (too hard?) for the bisect algorithm's reporting. https://stackoverflow.com/questions/5638211/how-do-you-get-git-bisect-to-ignore-merged-branches contains a number of the ideas.. Philip
Re: "git bisect run make" adequate to locate first unbuildable commit?
From: "Robert P. J. Day"writing a short tutorial on "git bisect" and, all the details of special exit code 125 aside, if one wanted to locate the first unbuildable commit, would it be sufficient to just run? $ git bisect run make as i read it, make returns either 0, 1 or 2 so there doesn't appear to be any possibility of weirdness with clashing with a 125 exit code. am i overlooking some subtle detail here i should be aware of? thanks. rday In the spirit of pedanticism, one should also clarify the word "first", in that it's not a linear search for _an_ unbuildable commit, but that one is looking for the transition between an unbroken sequence of unbuildable commits, which transitions to buildable commits, and its the transition that is sought. (there could be many random unbuildable commits within a sequence in some folks' processes!) -- Philip
RE: git send-email sets date
Behalf Of brian m. carlson > On Fri, Jan 26, 2018 at 06:32:30PM +0100, Michal Suchánek wrote: > > git send-email sets the message date to author date. > > > > This is wrong because the message will most likely not get delivered > > when the author date differs from current time. It might give slightly > > better results with commit date instead of author date but can't is > > just skip that header and leave it to the mailer? > > > > It does not even seem to have an option to suppress adding the date > > header. > > I'm pretty sure it's intended to work this way. > > Without the Date header, we have no way of providing the author date > when sending a patch. git am will read this date and use it as the > author date when applying patches, so if it's omitted, the author date > will be wrong. > > If you want to send patches with a different date, you can always insert > the patch inline in your mailer using the scissors notation, which will > allow your mailer to insert its own date while keeping the patch date > separate. > -- Michal, you may want to hack up an option that can automatically create that format if it is of use. I sometimes find the sort order an issue in some of my mail clients. -- Philip
Re: [PATCH 3/3] perf/aggregate: sort JSON fields in output
From: "Christian Couder"It is much easier to diff the output against a preivous s/preivous/previous/ one when the fields are sorted. Signed-off-by: Christian Couder --- t/perf/aggregate.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl index d616d31ca8..fcc0313e65 100755 --- a/t/perf/aggregate.perl +++ b/t/perf/aggregate.perl @@ -253,7 +253,7 @@ sub print_codespeed_results { } } - print to_json(\@data, {utf8 => 1, pretty => 1}), "\n"; + print to_json(\@data, {utf8 => 1, pretty => 1, canonical => 1}), "\n"; } binmode STDOUT, ":utf8" or die "PANIC on binmode: $!"; -- 2.16.0.rc2.45.g09a1bbd803
Re: cygwin git and golang: how @{u} is handled
From: "John Cheng"I am experiencing a strange behavior and I'm not certain if it is a problem with golang or the cygwin version of git. Steps to reproduce: Use golang's os/exec library to execute exec.Command(os.Args[1],"log","@{u}") // where os.Args[1] is either cygwin git or Windows git Expected result: commit 09357db3a29909c3498143b0d06989e00f5e2442 Author: John Cheng Date: Sun Jan 14 10:57:01 2018 -0800 ... Actual result: Suppose that cygwin git is specified, the result becomes: exit status 128 fatal: ambiguous argument '@u': unknown revision or path not in the working tree. Version: git version 2.15.1.windows.2 git version 2.15.1 I'm not certain if this is a git problem, as I could not reproduce this problem using python to script cygwin git. A list of scenarios I've tested are 1. golang + cygwin git = "exit code 128" 2. golang + windows git = "exit code 0" 3. python + cygwin git = "exit code 0" 4. python + windows git = "exit code 0" I've tried to write a simple program to echo the command line parameters passed by go into the process it executes - and it appears that go itself does not change "@{u}" into "@u". I'm a bit stuck at point to figure out which may be the cause: golang or git. I figured I'd start here. There is a similar problem a user is experiencing on Git-for-Windows, that we/the user haven't got to the bottom of, but it appears to have a similar form where the braces appear to be is some form parsed twice (though thats still a guess / hypothesis). "Aliases in git are stripping curly-brackets (#1220)" https://github.com/git-for-windows/git/issues/1220#issuecomment-340341336 Philip
Re: [PATCH 10/8] [DO NOT APPLY, but improve?] rebase--interactive: introduce "stop" command
From: "Jacob Keller"On Thu, Jan 18, 2018 at 10:36 AM, Stefan Beller wrote: Jake suggested using "x false" instead of "edit" for some corner cases. I do prefer using "x false" for all kinds of things such as stopping before a commit (edit only let's you stop after a commit), and the knowledge that "x false" does the least amount of actions behind my back. We should have that command as well, maybe? I agree. I use "x false" very often, and I think stop is probably a better solution since it avoids spawning an extra shell that will just fail. Not sure if stop implies too much about "stop the whole thing" as opposed to "stop here and let me do something manual", but I think it's clear enough. 'hold' or 'pause' maybe options (leads to http://www.thesaurus.com/browse/put+on+hold offering procastinate etc.) 'adjourn'. Signed-off-by: Stefan Beller --- git-rebase--interactive.sh | 1 + sequencer.c| 10 ++ 2 files changed, 11 insertions(+) diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index 3cd7446d0b..9eac53f0c5 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -166,6 +166,7 @@ l, label = label current HEAD with a name t, reset = reset HEAD to a label b, bud = reset HEAD to the revision labeled 'onto', no arguments m, merge []* = create a merge commit using a given commit's message +y, stay = stop for shortcut for These lines can be re-ordered; they are executed from top to bottom. " | git stripspace --comment-lines >>"$todo" diff --git a/sequencer.c b/sequencer.c index 2b4e6b1232..4b3b9fe59d 100644 --- a/sequencer.c +++ b/sequencer.c @@ -782,6 +782,7 @@ enum todo_command { TODO_RESET, TODO_BUD, TODO_MERGE, + TODO_STOP, /* commands that do nothing but are counted for reporting progress */ TODO_NOOP, TODO_DROP, @@ -803,6 +804,7 @@ static struct { { 'l', "label" }, { 't', "reset" }, { 'b', "bud" }, + { 'y', "stay" }, { 'm', "merge" }, { 0, "noop" }, { 'd', "drop" }, @@ -1307,6 +1309,12 @@ static int parse_insn_line(struct todo_item *item, const char *bol, char *eol) return 0; } + if (item->command == TODO_STOP) { + item->commit = NULL; + item->arg = ""; + item->arg_len = 0; + } + end_of_object_name = (char *) bol + strcspn(bol, " \t\n"); item->arg = end_of_object_name + strspn(end_of_object_name, " \t"); item->arg_len = (int)(eol - item->arg); @@ -2407,6 +2415,8 @@ static int pick_commits(struct todo_list *todo_list, struct replay_opts *opts) /* `current` will be incremented below */ todo_list->current = -1; } + } else if (item->command == TODO_STOP) { + todo_list->current = -1; } else if (item->command == TODO_LABEL) res = do_label(item->arg, item->arg_len); else if (item->command == TODO_RESET) -- 2.16.0.rc1.238.g530d649a79-goog
Re: [PATCH 8/8] rebase -i: introduce --recreate-merges=no-rebase-cousins
From: "Johannes Schindelin"This one is a bit tricky to explain, so let's try with a diagram: C / \ A - B - E - F \ / D To illustrate what this new mode is all about, let's consider what happens upon `git rebase -i --recreate-merges B`, in particular to the commit `D`. In the default mode, the new branch structure is: --- C' -- / \ A - B -- E' - F' \/ D' This is not really preserving the branch topology from before! The reason is that the commit `D` does not have `B` as ancestor, and therefore it gets rebased onto `B`. However, when recreating branch structure, there are legitimate use cases where one might want to preserve the branch points of commits that do not descend from the commit that was passed to the rebase command, e.g. when a branch from core Git's `next` was merged into Git for Windows' master we will not want to rebase those commits on top of a Windows-specific commit. In the example above, the desired outcome would look like this: --- C' -- / \ A - B -- E' - F' \/ -- D' -- I'm not understanding this. I see that D properly starts from A, but don't see why it is now D'. Surely it's unchanged. Maybe it's the arc/node confusion. Maybe even spell out that the rebased commits from the command are B..HEAD, but that includes D, which may not be what folk had expected. (not even sure if the reflog comes into determining merge-bases here..) I do think an exact definition is needed (e.g. via --ancestry-path or its equivalent?). Let's introduce the term "cousins" for such commits ("D" in the example), and the "no-rebase-cousins" mode of the merge-recreating rebase, to help those use cases. Signed-off-by: Johannes Schindelin --- Documentation/git-rebase.txt | 7 ++- builtin/rebase--helper.c | 9 - git-rebase--interactive.sh| 1 + git-rebase.sh | 12 +++- sequencer.c | 4 sequencer.h | 8 t/t3430-rebase-recreate-merges.sh | 23 +++ 7 files changed, 61 insertions(+), 3 deletions(-) diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt index 1d061373288..ac07a5c3fc9 100644 --- a/Documentation/git-rebase.txt +++ b/Documentation/git-rebase.txt @@ -368,10 +368,15 @@ The commit list format can be changed by setting the configuration option rebase.instructionFormat. A customized instruction format will automatically have the long commit hash prepended to the format. ---recreate-merges:: +--recreate-merges[=(rebase-cousins|no-rebase-cousins)]:: Recreate merge commits instead of flattening the history by replaying merges. Merge conflict resolutions or manual amendments to merge commits are not preserved. ++ +By default, or when `rebase-cousins` was specified, commits which do not have +`` as direct ancestor are rebased onto `` (or ``, +if specified). If the `rebase-cousins` mode is turned off, such commits will +retain their original branch point. -p:: --preserve-merges:: diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c index a34ab5c0655..ef08fef4d14 100644 --- a/builtin/rebase--helper.c +++ b/builtin/rebase--helper.c @@ -13,7 +13,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; unsigned flags = 0, keep_empty = 0, recreate_merges = 0; - int abbreviate_commands = 0; + int abbreviate_commands = 0, no_rebase_cousins = -1; enum { CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH, @@ -23,6 +23,8 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "ff", _ff, N_("allow fast-forward")), OPT_BOOL(0, "keep-empty", _empty, N_("keep empty commits")), OPT_BOOL(0, "recreate-merges", _merges, N_("recreate merge commits")), + OPT_BOOL(0, "no-rebase-cousins", _rebase_cousins, + N_("keep original branch points of cousins")), OPT_CMDMODE(0, "continue", , N_("continue rebase"), CONTINUE), OPT_CMDMODE(0, "abort", , N_("abort rebase"), @@ -57,8 +59,13 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) flags |= keep_empty ? TODO_LIST_KEEP_EMPTY : 0; flags |= abbreviate_commands ? TODO_LIST_ABBREVIATE_CMDS : 0; flags |= recreate_merges ? TODO_LIST_RECREATE_MERGES : 0; + flags |= no_rebase_cousins > 0 ? TODO_LIST_NO_REBASE_COUSINS : 0; flags |= command == SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0; + if (no_rebase_cousins >= 0&& !recreate_merges) + warning(_("--[no-]rebase-cousins has no effect without " + "--recreate-merges")); + if (command == CONTINUE && argc == 1) return !!sequencer_continue(); if (command == ABORT && argc == 1) diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index 3459ec5a018..23184c77e88 100644 ---
Re: [PATCH 4/8] rebase-helper --make-script: introduce a flag to recreate merges
From: "Johannes Schindelin"The sequencer just learned a new commands intended to recreate branch structure (similar in spirit to --preserve-merges, but with a substantially less-broken design). Let's allow the rebase--helper to generate todo lists making use of these commands, triggered by the new --recreate-merges option. For a commit topology like this: A - B - C \ / D Could the topology include the predecessor for context. Alo it is easy for readers to become confused between the arcs of the graphs and the nodes of the graphs, such that we confuse 'commits as patches' with 'commits as snapshots'. It might need an 'Aa' distinction between the two types, especially around merges and potential evilness. the generated todo list would look like this: # branch D pick 0123 A label branch-point pick 1234 D label D reset branch-point pick 2345 B merge 3456 D C To keep things simple, we first only implement support for merge commits with exactly two parents, leaving support for octopus merges to a later patch in this patch series. Signed-off-by: Johannes Schindelin --- builtin/rebase--helper.c | 4 +- sequencer.c | 343 ++- sequencer.h | 1 + 3 files changed, 345 insertions(+), 3 deletions(-) diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c index 7daee544b7b..a34ab5c0655 100644 --- a/builtin/rebase--helper.c +++ b/builtin/rebase--helper.c @@ -12,7 +12,7 @@ static const char * const builtin_rebase_helper_usage[] = { int cmd_rebase__helper(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; - unsigned flags = 0, keep_empty = 0; + unsigned flags = 0, keep_empty = 0, recreate_merges = 0; int abbreviate_commands = 0; enum { CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, @@ -22,6 +22,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) struct option options[] = { OPT_BOOL(0, "ff", _ff, N_("allow fast-forward")), OPT_BOOL(0, "keep-empty", _empty, N_("keep empty commits")), + OPT_BOOL(0, "recreate-merges", _merges, N_("recreate merge commits")), OPT_CMDMODE(0, "continue", , N_("continue rebase"), CONTINUE), OPT_CMDMODE(0, "abort", , N_("abort rebase"), @@ -55,6 +56,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) flags |= keep_empty ? TODO_LIST_KEEP_EMPTY : 0; flags |= abbreviate_commands ? TODO_LIST_ABBREVIATE_CMDS : 0; + flags |= recreate_merges ? TODO_LIST_RECREATE_MERGES : 0; flags |= command == SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0; if (command == CONTINUE && argc == 1) diff --git a/sequencer.c b/sequencer.c index a96255426e7..1bef16647b4 100644 --- a/sequencer.c +++ b/sequencer.c @@ -23,6 +23,8 @@ #include "hashmap.h" #include "unpack-trees.h" #include "worktree.h" +#include "oidmap.h" +#include "oidset.h" #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" @@ -2785,6 +2787,335 @@ void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag) strbuf_release(); } +struct labels_entry { + struct hashmap_entry entry; + char label[FLEX_ARRAY]; +}; + +static int labels_cmp(const void *fndata, const struct labels_entry *a, + const struct labels_entry *b, const void *key) +{ + return key ? strcmp(a->label, key) : strcmp(a->label, b->label); +} + +struct string_entry { + struct oidmap_entry entry; + char string[FLEX_ARRAY]; +}; + +struct label_state { + struct oidmap commit2label; + struct hashmap labels; + struct strbuf buf; +}; + +static const char *label_oid(struct object_id *oid, const char *label, + struct label_state *state) +{ + struct labels_entry *labels_entry; + struct string_entry *string_entry; + struct object_id dummy; + size_t len; + int i; + + string_entry = oidmap_get(>commit2label, oid); + if (string_entry) + return string_entry->string; + + /* + * For "uninteresting" commits, i.e. commits that are not to be + * rebased, and which can therefore not be labeled, we use a unique + * abbreviation of the commit name. This is slightly more complicated + * than calling find_unique_abbrev() because we also need to make + * sure that the abbreviation does not conflict with any other + * label. + * + * We disallow "interesting" commits to be labeled by a string that + * is a valid full-length hash, to ensure that we always can find an + * abbreviation for any uninteresting commit's names that does not + * clash with any other label. + */ + if (!label) { + char *p; + + strbuf_reset(>buf); + strbuf_grow(>buf, GIT_SHA1_HEXSZ); + label = p = state->buf.buf; + + find_unique_abbrev_r(p, oid->hash, default_abbrev); + + /* + * We may need to extend the abbreviated hash so that there is + * no conflicting label. + */ + if (hashmap_get_from_hash(>labels, strihash(p), p)) { + size_t i = strlen(p) + 1; + + oid_to_hex_r(p, oid); + for (; i < GIT_SHA1_HEXSZ; i++) { + char save = p[i]; + p[i] = '\0';
Re: [PATCH 1/8] sequencer: introduce new commands to reset the revision
From: "Jacob Keller"On Thu, Jan 18, 2018 at 7:35 AM, Johannes Schindelin wrote: This commit implements the commands to label, and to reset to, given revisions. The syntax is: label reset As a convenience shortcut, also to improve readability of the generated todo list, a third command is introduced: bud. It simply resets to the "onto" revision, i.e. the commit onto which we currently rebase. The code looks good, but I'm a little wary of adding bud which hard-codes a specific label. I suppose it does grant a bit of readability to the resulting script... ? It doesn't seem that important compared to use using "reset onto"? At least when documenting this it should be made clear that the "onto" label is special. Thanks, Jake. I'd agree. The special 'onto' label should be fully documented, and the commit message should indicate which patch actually defines it (and all its corner cases and fall backs if --onto isn't explicitly given..) Likewise the choice of 'bud' should be explained with some nice phraseology indicating that we are growing the new flowering from the bud, otherwise the word is a bit too short and sudden for easy explanation. Philip
Re: [PATCH] Remoted unnecessary void* from hashmap.h that caused compile warnings
From:Subject: [PATCH] Remoted unnecessary void* from hashmap.h that caused compile warnings s/Remoted/Removed/ ? Maybe shorten to " hashmap.h: remove unnecessary void* " (ex the superflous spaces) -- Philip From: "Randall S. Becker" * The while loop in the inline method hashmap_enable_item_counting used an unneeded variable. The loop has been revised accordingly. Signed-off-by: Randall S. Becker --- hashmap.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hashmap.h b/hashmap.h index 7ce79f3..d375d9c 100644 --- a/hashmap.h +++ b/hashmap.h @@ -400,7 +400,6 @@ static inline void hashmap_disable_item_counting(struct hashmap *map) */ static inline void hashmap_enable_item_counting(struct hashmap *map) { - void *item; unsigned int n = 0; struct hashmap_iter iter; @@ -408,7 +407,7 @@ static inline void hashmap_enable_item_counting(struct hashmap *map) return; hashmap_iter_init(map, ); - while ((item = hashmap_iter_next())) + while (hashmap_iter_next()) n++; map->do_count_items = 1; -- 2.8.5.23.g6fa7ec3
Re: [PATCH/RFC] diff: add --compact-summary option to complement --stat
(one spelling spotted).. From: "Nguyễn Thái Ngọc Duy"This is partly inspired by gerrit web interface which shows diffstat like this, e.g. with commit 0433d533f1 (notice the "A" column on the third line): Documentation/merge-config.txt | 4 + builtin/merge.c| 2 + A t/t5573-pull-verify-signatures.sh | 81 ++ t/t7612-merge-verify-signatures.sh | 45 ++ 4 files changed, 132 insertions(+) In other words, certain information currently shown with --summary is embedded in the diffstat. This helps reading (all information of the same file in the same line instead of two) and can reduce the number of lines if you add/delete a lot of files. The new option --compact-summary implements this with a tweak to support mode change, which is shown in --summary too. For mode changes, executable bit is denoted as "(+x)" or "(-x)" when it's added or removed respectively. The same for when a regular file is replaced with a symlink "(+l)" or the other way "(-l)". This also applies to new files. New regulare files are "A", while new executable files or symlinks are "A+x" or "A+l". Note, there is still one piece of information missing from --summary, the rename/copy percentage. That could probably be added later. It's not as useful as the others anyway. Signed-off-by: Nguyễn Thái Ngọc Duy --- I have had something similar for years but the data is shown after the path name instead (it's incidentally shown in the diffstat right below). I was going to clean it up and submit it again, but my recent experience with Gerrit changed my mind a bit about the output. Documentation/diff-options.txt | 11 diff.c | 64 +- diff.h | 1 + t/t4013-diff-various.sh| 5 ++ ...y_--root_--stat_--compact-summary_initial (new) | 12 ...R_--root_--stat_--compact-summary_initial (new) | 12 ...ree_--stat_--compact-summary_initial_mode (new) | 4 ++ ..._-R_--stat_--compact-summary_initial_mode (new) | 4 ++ 8 files changed, 110 insertions(+), 3 deletions(-) create mode 100644 t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--stat_--compact-summary_initial_mode create mode 100644 t/t4013/diff.diff-tree_-R_--stat_--compact-summary_initial_mode diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 9d1586b956..ff93ff74d0 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -188,6 +188,17 @@ and accumulating child directory counts in the parent directories: Output a condensed summary of extended header information such as creations, renames and mode changes. +--compact-summary:: + Output a condensed summary of extended header information in + front of the file name part of diffstat. This option is + ignored if --stat is not specified. ++ +Fle creations or deletions are denoted with "A" or "D" respectively, s/Fle/File/ ? +optionally "+l" if it's a symlink, or "+x" if it's executable. +Mode changes are put in brackets, e.g. "+x" or "-x" for adding or +removing executable bit respectively, "+l" or "-l" for becoming a +symlink or a regular file. + ifndef::git-format-patch[] --patch-with-stat:: Synonym for `-p --stat`. diff --git a/diff.c b/diff.c index fb22b19f09..3f676d 100644 --- a/diff.c +++ b/diff.c @@ -2131,6 +2131,7 @@ struct diffstat_t { char *from_name; char *name; char *print_name; + const char *status_code; unsigned is_unmerged:1; unsigned is_binary:1; unsigned is_renamed:1; @@ -2271,6 +2272,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) { int i, len, add, del, adds = 0, dels = 0; uintmax_t max_change = 0, max_len = 0; + int max_status_len = 0; int total_files = data->nr, count; int width, name_width, graph_width, number_width = 0, bin_width = 0; const char *reset, *add_c, *del_c; @@ -2287,6 +2289,18 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) add_c = diff_get_color_opt(options, DIFF_FILE_NEW); del_c = diff_get_color_opt(options, DIFF_FILE_OLD); + for (i = 0; (i < count) && (i < data->nr); i++) { + const struct diffstat_file *file = data->files[i]; + int len; + + if (!file->status_code) + continue; + len = strlen(file->status_code) + 1; + + if (len > max_status_len) + max_status_len = len; + } + /* * Find the longest filename and max number of changes */ @@ -2383,6 +2397,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) options->stat_name_width < max_len) ? options->stat_name_width : max_len; + name_width += max_status_len; + /* * Adjust adjustable widths not to exceed maximum width */ @@ -2402,6 +2418,8
Re: Errors and other unpleasant things found by Cppcheck
From: "Friedrich Spee von Langenfeld"Hi, I analyzed the GitHub repository with Cppcheck. The resulting XML file is attached. Please open it in Cppcheck to view it comfortably. Especially the bunch of errors could be of interest to you. Hi, Thanks for the submission. The list prefers that useful information is in plain text so as to avoid opening file types that may hide undesirable effects. Was your analysis part of an organised scan, or a personal insight? It would help to know the background. The project does have a number of known and accepted cases of 'unitialised variables' and known memory leaks which are acceptable in those cases. If you picked out the few key issues that you feel should be addressed then a patch can be considered, e.g. the suggestion of the wildmatch macro (L263) that depends on the order of evaluation of side effects. -- Philip
Re: Re: Unify annotated and non-annotated tags
From: "anatoly techtonik" <techto...@gmail.com> From: Philip Oakley > So if I understand correctly, the hope is that `git show-ref --tags` > could > get an alternate option `--all-tags` [proper option name required...] > such > that the user would not have to develop the rather over the complicated > expression that used a newish capability of a different command. > Would that be right? That's correct. > Or at least update the man page docs to clarify the annotated vs > non-annotated tags issue (many SO questions!). Are there stats how many users read man pages and what is their reading session length? I mean docs may not help much, The "reading the manual" question is fairly well answered in the Human Error literature in terms of clarity and effectiveness, and the normal human error rates (for interest search for "Panko" "Spreadsheet errors" [1]). Typical human error rate is 1%. Most pilot error ends up being, in part, caused by confusing / incomplete manuals (i.e. we fail to support them). If the manuals are the peak of perfection then they are well visited and the supporting material is usually good. If manuals are a sprawling upland with bogs, fissure, islands of inaccessability, then they are rarely used. Git does suffer from having a lot of separate commands, which makes seeing the woods for the trees difficult sometimes, especially as its core concepts are not always well understood. Improving the manuals (as reference material) will always help, even if the trickle down effect is slow (made worse by alternate sources of error - Stackoverflow and blogs... ;-) > And indicate if the --dereference and/or --hash options would do the > trick! > - maybe the "^{}" appended would be part of the problem (and need that > new > option "--objectreference" ). --dereference would work if it didn't require extra processing. It is hard to think about other option name that would give desired result. --- anatoly t. -- Philip [1] https://arxiv.org/abs/1602.02601 https://arxiv.org/pdf/1602.02601 "This paper reviews human cognition processes and shows first that humans cannot be error free no matter how hard they try, and second that our intuition about errors and how we can reduce them is based on appallingly bad knowledge."
Re: [WIP 12/15] ls-refs: introduce ls-refs server command
From: "Brandon Williams"Sent: Monday, December 04, 2017 11:58 PM Introduce the ls-refs server command. In protocol v2, the ls-refs command is used to request the ref advertisement from the server. Since it is a command which can be requested (as opposed to manditory in v1), a clinet can sent a number of parameters in its request to limit the ref s/clinet/client/ advertisement based on provided ref-patterns. Signed-off-by: Brandon Williams --- Philip
Re: [PATCH 0/8] Codespeed perf results
From: "Christian Couder"This patch series is built on top of cc/perf-run-config which recently graduated to master. It makes it possible to send perf results to a Codespeed server. See https://github.com/tobami/codespeed/ and web sites like http://speed.pypy.org/ which are using Codespeed. The end goal would be to have such a server always available to track how the different git commands perform over time on different kind of repos (small, medium, large, ...) with different optimizations on and off (split-index, libpcre2, BLK_SHA1, ...) Dumb question: is this expected to also be able to do a retrospective on the performance of appropriate past releases? That would allow immediate performance comparisons, rather than needing to wait for a few releases to see the trends. Philip With this series and a config file like: $ cat perf.conf [perf] dirsOrRevs = v2.12.0 v2.13.0 repeatCount = 10 sendToCodespeed = http://localhost:8000 repoName = Git repo [perf "with libpcre"] makeOpts = "DEVELOPER=1 USE_LIBPCRE=YesPlease" [perf "without libpcre"] makeOpts = "DEVELOPER=1" One should be able to just launch: $ ./run --config perf.conf p7810-grep.sh and then get nice graphs in a Codespeed instance running on http://localhost:8000. Caveat ~~ For now one has to create the "Git repo" environment in the Codespeed admin interface. (We send the perf.repoName config variable in the "environment" Codespeed field.) This is because Codespeed requires the environment fields to be created and does not provide a simple way to create these fields programmatically. I might try to work around this problem in the future. Links ~ This patch series is available here: https://github.com/chriscool/git/commits/codespeed The cc/perf-run-config patch series was discussed here: v1: https://public-inbox.org/git/20170713065050.19215-1-chrisc...@tuxfamily.org/ v2: https://public-inbox.org/git/cap8ufd2j-ufh+9awz91gtz-jusq7euoexmguro59vpf29jx...@mail.gmail.com/ Christian Couder (8): perf/aggregate: fix checking ENV{GIT_PERF_SUBSECTION} perf/aggregate: refactor printing results perf/aggregate: implement codespeed JSON output perf/run: use $default_value instead of $4 perf/run: add conf_opts argument to get_var_from_env_or_config() perf/run: learn about perf.codespeedOutput perf/run: learn to send output to codespeed server perf/run: read GIT_TEST_REPO_NAME from perf.repoName t/perf/aggregate.perl | 164 +++--- t/perf/run| 29 +++-- 2 files changed, 140 insertions(+), 53 deletions(-) -- 2.15.1.361.g8b07d831d0
Re: [PATCH] partial-clone: design doc
From: "Junio C Hamano" <gits...@pobox.com> "Philip Oakley" <philipoak...@iee.org> writes: + These filtered packfiles are incomplete in the traditional sense because + they may contain trees that reference blobs that the client does not have. Is a comment needed here noting that currently, IIUC, the complete trees are fetched in the packfiles, it's just the un-necessary blobs that are omitted ? I probably am misreading what you meant to say, but the above statement with "currently" taken literally to mean the system without JeffH's changes, is false. I was meaning the current JeffH's V6 series, rather than the last Git release. In one of the previous discussions Jeff had noted that (at that time) his partial design would provide a full set of trees for the selected commits (excluding the trees already available locally), but only a few of the file blobs (based on the filter spec). So yes, I should have been clearer to avoid talking at cross purposes. When the receiver says it has commit A and the sender wants to send a commit B (because the receiver said it does not have it, and it wants it), trees in A are not sent in the pack the sender sends to give objects sufficient to complete B, which the receiver wanted to have, even if B also has those trees. If you fetch from me twice and between that time Documentation/ directory did not change, the second fetch will not have the tree object that corresponds to that hierarchy (and of course no blobs and sub trees inside it). Though, after the fetch has completed (v2.15 Git), the receiver will have the 'full set of trees and blobs'. In Jeff's design (V6) the reciever would still have a full set of trees, but only a partial set of the blobs. So my viewpoint was not of the pack file but of the receiver's object store after the fetch. So "the complete trees are fetched" is not true. What is true (and what matters more in JeffH's document) is that fetching is done in such a way that objects resulting in the receiving repository are complete in the current system that does not allow promised objects. If some objects resulting in the receiving repository are incomplete, the current system considers that we corrupted the repository. The promise mechanism says that it is fine for the receiving end to lack blobs, trees or commits, as long as the promisor repository tells it that these "missing" objects can be obtained from it later. True. (though I'm not sure exactly how Jeff decides about commits - I thought theye were not part of this optimisation) The way the receiving end which notices that it does not have an otherwise required blob, tree or commit is one promised by the promisor repository is to see if it is referenced by a pack that came from such a promisor repository. .. and marked as such with the ".promisor" extension. Thanks.
Re: [PATCH] partial-clone: design doc
From: "Jeff Hostetler"From: Jeff Hostetler First draft of design document for partial clone feature. Signed-off-by: Jeff Hostetler Signed-off-by: Jonathan Tan --- Documentation/technical/partial-clone.txt | 240 ++ 1 file changed, 240 insertions(+) create mode 100644 Documentation/technical/partial-clone.txt diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt new file mode 100644 index 000..7ab39d8 --- /dev/null +++ b/Documentation/technical/partial-clone.txt @@ -0,0 +1,240 @@ +Partial Clone Design Notes +== + +The "Partial Clone" feature is a performance optimization for git that +allows git to function without having a complete copy of the repository. + I think it would be worthwhile at least listing the issues that make the 'optimisation' necessary, and then the available factors that make the optimisation possible. This helps for future adjustments when those issues and factors change. I think the issues are: * the size of the repository that is being cloned, both in the width of a commit (you mentioned 100M trees) and the time (hours to days) / size to clone over the connection. While the supporting factor is: * the remote is always on-line and available for on-demand object fetching (seconds) The solution choice then should fall out fairly obviously, and we can separate out the other optimisations that are based on other views about the issues. E.g. my desire for a solution in the off-line case. In fact the current design, apart from some terminology, does look well matched, with only a couple of places that would be affected. The airplane-mode expectations of a partial clone should also be stated. +During clone and fetch operations, git normally downloads the complete +contents and history of the repository. That is, during clone the client +receives all of the commits, trees, and blobs in the repository into a +local ODB. Subsequent fetches extend the local ODB with any new objects. +For large repositories, this can take significant time to download and +large amounts of diskspace to store. + +The goal of this work is to allow git better handle extremely large +repositories. Shouln't this goal be nearer the top? Often in these repositories there are many files that the +user does not need such as ancient versions of source files, files in +portions of the worktree outside of the user's work area, or large binary +assets. If we can avoid downloading such unneeded objects *in advance* +during clone and fetch operations, we can decrease download times and +reduce ODB disk usage. + Does this need to distinguish between the shallow clone mechanism for reducing the cloning of old history from the desire for a width wise partial clone of only the users narrow work area, and/or without large files/blobs? + +Non-Goals +- + +Partial clone is independent of and not intended to conflict with +shallow-clone, refspec, or limited-ref mechanisms since these all operate +at the DAG level whereas partial clone and fetch works *within* the set +of commits already chosen for download. + + +Design Overview +--- + +Partial clone logically consists of the following parts: + +- A mechanism for the client to describe unneeded or unwanted objects to + the server. + +- A mechanism for the server to omit such unwanted objects from packfiles + sent to the client. + +- A mechanism for the client to gracefully handle missing objects (that + were previously omitted by the server). + +- A mechanism for the client to backfill missing objects as needed. + + +Design Details +-- + +- A new pack-protocol capability "filter" is added to the fetch-pack and + upload-pack negotiation. + + This uses the existing capability discovery mechanism. + See "filter" in Documentation/technical/pack-protocol.txt. + +- Clients pass a "filter-spec" to clone and fetch which is passed to the + server to request filtering during packfile construction. + + There are various filters available to accomodate different situations. + See "--filter=" in Documentation/rev-list-options.txt. + +- On the server pack-objects applies the requested filter-spec as it + creates "filtered" packfiles for the client. + + These filtered packfiles are incomplete in the traditional sense because + they may contain trees that reference blobs that the client does not have. Is a comment needed here noting that currently, IIUC, the complete trees are fetched in the packfiles, it's just the un-necessary blobs that are omitted ? + + + How the local repository gracefully handles missing objects + +With partial clone, the fact that objects can be missing makes such +repositories incompatible with older versions of Git, necessitating a +repository extension (see the
Re: Re: Re: bug deleting "unmerged" branch (2.12.3)
From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> Hi! Sorry for the late response: On a somewhat not-up-to date manual: -d, --delete Delete a branch. The branch must be fully merged in its upstream branch, or in HEAD if no upstream was set with --track or --set-upstream. Maybe the topic of multiple branches pointing to the same commit could be mentioned (regarding the status of each such branch being considered to be merged or not). Also "fully merged" could be made a bit more precise, maybe. Maybe gitglossary could have definitions for "merged" and "fully merged" with manual pages referring to it. Thanks, I'll add your note to my list of clarifications. Philip Regards, Ulrich "Philip Oakley" <philipoak...@iee.org> schrieb am 08.12.2017 um 21:26 in Nachricht <582105F8768F4DA6AF4EC82888F0BFBE@PhilipOakley>: From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> Hi Philip! I'm unsure what you are asking for... Ulrich Hi Ulrich, I was doing a retrospective follow up (of the second kind [1]). In your initial email https://public-inbox.org/git/5a1d70fd02a100029...@gwsmtp1.uni-regensburg.d e/ you said "I wanted to delete the temporary branch (which is of no use now), I got a message that the branch is unmerged. I think if more than one branches are pointing to the same commit, one should be allowed to delete all but the last one without warning." My retrospectives question was to find what what part of the documentation could be improved to assist fellow coders and Git users in gaining a better understanding here. I think it's an easy mistake [2] to make and that we should try to make the man pages more assistive. I suspect that the description for the `git branch -d` needs a few more words to clarify the 'merged/unmerged' issue for those who recieve the warning message. Or maybe the git-glossary, etc. I tend to believe that most users will read some of the man pages, and would continue to do so if they are useful. I'd welcome any feedback or suggestions you could provide. -- Philip >>> "Philip Oakley" <philipoak...@iee.org> 04.12.17 0.30 Uhr >>> From: "Junio C Hamano" <gits...@pobox.com> > "Philip Oakley" <philipoak...@iee.org> writes: > >> I think it was that currently you are on M, and neither A nor B are >> ancestors (i.e. merged) of M. >> >> As Junio said:- "branch -d" protects branches that are yet to be >> merged to the **current branch**. > > Actually, I think people loosened this over time and removal of > branch X is not rejected even if the range HEAD..X is not empty, as > long as X is marked to integrate with/build on something else with > branch.X.{remote,merge} and the range X@{upstream}..X is empty. > > So the stress of "current branch" above you added is a bit of a > white lie. Ah, thanks. [I haven't had chance to check the code] The man page does say: .-d .Delete a branch. The branch must be fully merged in its upstream .branch, or in HEAD if no upstream was set with --track .or --set-upstream. It's whether or not Ulrich had joined the two aspects together, and if the doc was sufficient to help recognise the 'unmerged' issue. Ulrich? -- Philip [1] Retrospective Second Directive, section 3.4.2 of (15th Ed) Agile Processes in software engineering and extreme programming. ISBN 1628251042 (for the perspective of the retrospective..) [2] 'mistake' colloquial part of the error categories of slips lapses and mistakes : Human Error, by Reason (James, prof) ISBN 0521314194 (worthwhile)
Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)
From: "Christian Couder"On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano wrote: * jh/object-filtering (2017-12-05) 9 commits (merged to 'next' on 2017-12-05 at 3a56b51085) + rev-list: support --no-filter argument + list-objects-filter-options: support --no-filter + list-objects-filter-options: fix 'keword' typo in comment (merged to 'next' on 2017-11-27 at e5008c3b28) + pack-objects: add list-objects filtering + rev-list: add list-objects filtering support + list-objects: filter objects in traverse_commit_list + oidset: add iterator methods to oidset + oidmap: add oidmap iterator methods + dir: allow exclusions from blob in addition to file (this branch is used by jh/fsck-promisors and jh/partial-clone.) In preparation for implementing narrow/partial clone, the object walking machinery has been taught a way to tell it to "filter" some objects from enumeration. * jh/fsck-promisors (2017-12-05) 12 commits - gc: do not repack promisor packfiles - rev-list: support termination at promisor objects - fixup: sha1_file: add TODO - fixup: sha1_file: convert gotos to break/continue - sha1_file: support lazily fetching missing objects - introduce fetch-object: fetch one promisor object - index-pack: refactor writing of .keep files - fsck: support promisor objects as CLI argument - fsck: support referenced promisor objects - fsck: support refs pointing to promisor objects - fsck: introduce partialclone extension - extension.partialclone: introduce partial clone extension (this branch is used by jh/partial-clone; uses jh/object-filtering.) In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. I am currently working on integrating this series with my external odb series (https://public-inbox.org/git/20170916080731.13925-1-chrisc...@tuxfamily.org/). I too had seen that, as currently configured, the 'partialClone' could be seen as a method for using the remote as if it were an object database (odb) that was part of an 'always on-line' capability. However I'm cautious about locking out the original DVCS capability of being off-line relative to some, or all, remotes and still needing to work in 'airplane mode'. It should be OK for the local narrowClone (my term) to be totally off-line for a while and still be able to work when back on line with other suitable remotes, even after the original remote has gone. Instead of using an "extension.partialclone" config variable, an odb will be configured like using an "odb..promisorRemote" (the name might still change) config variable. Other odbs could still be configured using "odb..scriptCommand" and "odb..subprocessCommand". The future work Jeff had indicated, IIRC, should be able to cope with multiple promisor remotes, which it's to be hope this could handle. I'm not sure how the odb code would handle a partial failure where a partition of the odb stops being available. The current work is still very much WIP and some tests fail, but you can take a look there: https://github.com/chriscool/git/tree/gl-promisor-external-odb440 -- Philip
Re: Re: bug deleting "unmerged" branch (2.12.3)
From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> Hi Philip! I'm unsure what you are asking for... Ulrich Hi Ulrich, I was doing a retrospective follow up (of the second kind [1]). In your initial email https://public-inbox.org/git/5a1d70fd02a100029...@gwsmtp1.uni-regensburg.de/ you said "I wanted to delete the temporary branch (which is of no use now), I got a message that the branch is unmerged. I think if more than one branches are pointing to the same commit, one should be allowed to delete all but the last one without warning." My retrospectives question was to find what what part of the documentation could be improved to assist fellow coders and Git users in gaining a better understanding here. I think it's an easy mistake [2] to make and that we should try to make the man pages more assistive. I suspect that the description for the `git branch -d` needs a few more words to clarify the 'merged/unmerged' issue for those who recieve the warning message. Or maybe the git-glossary, etc. I tend to believe that most users will read some of the man pages, and would continue to do so if they are useful. I'd welcome any feedback or suggestions you could provide. -- Philip >>> "Philip Oakley" <philipoak...@iee.org> 04.12.17 0.30 Uhr >>> From: "Junio C Hamano" <gits...@pobox.com> > "Philip Oakley" <philipoak...@iee.org> writes: > >> I think it was that currently you are on M, and neither A nor B are >> ancestors (i.e. merged) of M. >> >> As Junio said:- "branch -d" protects branches that are yet to be >> merged to the **current branch**. > > Actually, I think people loosened this over time and removal of > branch X is not rejected even if the range HEAD..X is not empty, as > long as X is marked to integrate with/build on something else with > branch.X.{remote,merge} and the range X@{upstream}..X is empty. > > So the stress of "current branch" above you added is a bit of a > white lie. Ah, thanks. [I haven't had chance to check the code] The man page does say: .-d .Delete a branch. The branch must be fully merged in its upstream .branch, or in HEAD if no upstream was set with --track .or --set-upstream. It's whether or not Ulrich had joined the two aspects together, and if the doc was sufficient to help recognise the 'unmerged' issue. Ulrich? -- Philip [1] Retrospective Second Directive, section 3.4.2 of (15th Ed) Agile Processes in software engineering and extreme programming. ISBN 1628251042 (for the perspective of the retrospective..) [2] 'mistake' colloquial part of the error categories of slips lapses and mistakes : Human Error, by Reason (James, prof) ISBN 0521314194 (worthwhile)
Re: How hard would it be to implement sparse fetching/pulling?
From: "Jeff Hostetler" <g...@jeffhostetler.com> Sent: Monday, December 04, 2017 3:36 PM On 12/2/2017 11:30 AM, Philip Oakley wrote: From: "Jeff Hostetler" <g...@jeffhostetler.com> Sent: Friday, December 01, 2017 2:30 PM On 11/30/2017 8:51 PM, Vitaly Arbuzov wrote: I think it would be great if we high level agree on desired user experience, so let me put a few possible use cases here. 1. Init and fetch into a new repo with a sparse list. Preconditions: origin blah exists and has a lot of folders inside of src including "bar". Actions: git init foo && cd foo git config core.sparseAll true # New flag to activate all sparse operations by default so you don't need to pass options to each command. echo "src/bar" > .git/info/sparse-checkout git remote add origin blah git pull origin master Expected results: foo contains src/bar folder and nothing else, objects that are unrelated to this tree are not fetched. Notes: This should work same when fetch/merge/checkout operations are used in the right order. With the current patches (parts 1,2,3) we can pass a blob-ish to the server during a clone that refers to a sparse-checkout specification. I hadn't appreciated this capability. I see it as important, and should be available both ways, so that a .gitNarrow spec can be imposed from the server side, as well as by the requester. It could also be used to assist in the 'precious/secret' blob problem, so that AWS keys are never pushed, nor available for fetching! To be honest, I've always considered partial clone/fetch as a client-side request as a performance feature to minimize download times and disk space requirements on the client. Mine was a two way view where one side or other specified an extent for the narrow clone to achieve either the speed/space improvement or partitioning capability. I've not thought of it from the "server has secrets" point of view. My potential for "secrets" was a little softer that some of the 'hard' security that is often discussed. I'm for the layered risk approach (swiss cheese model) We can talk about it, but I'd like to keep it outside the scope of the current effort. Agreed. My concerns are that that is not the appropriate mechanism to enforce MAC/DAC like security mechanisms. For example: [a] The client will still receive the containing trees that refer to the sensitive blobs, so the user can tell when the secret blobs change -- they wouldn't have either blob, but can tell when they are changed. This event by itself may or may not leak sensitive information depending on the terms of the security policy in place. [b] The existence of such missing blobs would tell the client which blobs are significant and secret and allow them to focus their attack. It would be better if those assets were completely hidden and not in the tree at all. [c] The client could push a fake secret blob to replace the valid one on the server. You would have to audit the server to ensure that it never accepts a push containing a change to any secret blob. And the server would need an infrastructure to know about all secrets in the tree. [d] When a secret blob does change, any local merges by the user lack information to complete the merge -- they can't merge the secrets and they can't be trusted to correctly pick-ours or pick-theirs -- so their workflows are broken. I'm not trying to blindly spread FUD here, but it is arguments like these that make me suggest that the partial clone mechanism is not the right vehicle for such "secret" blobs. I'm on the 'a little security is better than no security' side, but all the points are valid. There's a bit of a chicken-n-egg problem getting things set up. So if we assume your team would create a series of "known enlistments" under version control, then you could s/enlistments/entitlements/ I presume? Within my org we speak of "enlistments" as subset of the tree that you plan to work on. For example, you might enlist in the "file system" portion of the tree or in the "device drivers" portion. If the Makefiles have good partitioning, you should only need one of the above portions to do productive work within a feature area. Ah, so it's the things that have been requested by the client (I'd like to the enlist..) I'm not sure what you mean by "entitlements". It is like having the title deeds to a house - a list things you have, or can have. (e.g. a father saying: you can have the car on Saturday 6pm -11pm) At the end of the day the particular lists would be the same, they guide what is sent. just reference one by : during your clone. The server can lookup that blob and just use it. git clone --filter=sparse:oid=master:templates/bar URL And then the server will filter-out the unwanted blobs dur
Re: [RFE] Inverted sparseness (amended)
From: "Randall S. Becker" <rsbec...@nexbridge.com> On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom: From: "Randall S. Becker" <rsbec...@nexbridge.com> [...] If using the empty tree part doesn't pass muster (i.e. showing nothing isn't sufficient), then the narrow clone could come into play to limit what parts of the trees are widely visible, but mainly its using the grafts to cover the regulatory gap, and (for the moment) using fast-export to transfer the singleton commit / tags Oh Just remembered, there is the newish capability to fetch random blobs, so that may help. I think you hit the nail on the head pretty well. We're currently at 2.3.7, with a push to 2.15.1 this week, so I'm looking forward to trying this. My two worries are whether the empty tree is acceptable (it should be to the client, and might be to the vendor), and doing this reliably (semi-automated) so the user base does not have to worry about the gory details of doing this. The unit tests for it are undoubtedly going to give me headaches. Thanks for the advice. Islands of shallowness are a really descriptive image for what this is. So identifying that there are shoals (to extend the metaphor somewhat), will be crucial to this adventure. These islands of shallowness, however, are also concerns as described in the [Re: How hard would it be to implement sparse fetching/pulling?] thread. The matter of the security audit is important here also: I'm just thinking that even if we get a *perfectly working* partial clone/fetch/push/etc. that it would not pass a security audit. Philip says: I'd totally disagree in the sense that if we had a submodule anywhere_ in the repo that would be an independent island of code, and we are quite happy with that - we use the web of trust with the auditors for them to go check, separately, the oid of the independent portion, which may be at another site or another vendor/client. That's OK, so what's the problem here... We do the same for pinning the tips and tails of the lines of development that make for the shallowness and narrowness that create these shoals, and oxbows of development. Managing them is normal human activity, with the technical support that the Git chain provides - so much better than previous 'versioning systems' that we see regularly in engineering, with backdoor tweaks etc. The key is to ensure that there is a proper hand holding across the air gaps, such that the oids exist both sides of the gaps, and a properly built on, such that the hash chain is unbroken. It's a similar negotiation to those used for establishing web security between IP clients, so it is doable. But you are right to have concerns and suspisions to ensure that it is all tested and verified -- Philip (sorry about the poor quoting of the reply) Not having the capability would similarly cause a failure of a security audit. Cheers, Randall -- Brief whoami: NonStop developer since approximately UNIX(421664400)/NonStop(2112884442) -- In my real life, I talk too much.
Re: [RFE] Inverted sparseness
From: "Randall S. Becker" :December 03, 2017 11:44 PM On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom: From: "Randall S. Becker" <rsbec...@nexbridge.com> Sent: Friday, December 01, 2017 6:31 PM On December 1, 2017 1:19 PM, Jeff Hostetler wrote: On 12/1/2017 12:21 PM, Randall S. Becker wrote: I recently encountered a really strange use-case relating to sparse clone/fetch that is really backwards from the discussion that has been going on, and well, I'm a bit embarrassed to bring it up, but I have no good solution including building a separate data store that will end up inconsistent with repositories (a bad solution). The use-case is as follows: Given a backbone of multiple git repositories spread across an organization with a server farm and upstream vendors. The vendor delivers code by having the client perform git pull into a specific branch. The customer may take the code as is or merge in customizations. The vendor wants to know exactly what commit of theirs is installed on each server, in near real time. The customer is willing to push the commit-ish to the vendor's upstream repo but does not want, by default, to share the actual commit contents for security reasons. Realistically, the vendor needs to know that their own commit id was put somewhere (process exists to track this, so not part of the use-case) and whether there is a subsequent commit contributed >by the customer, but the content is not relevant initially. After some time, the vendor may request the commit contents from the customer in order to satisfy support requirements - a.k.a. a defect was found but has to be resolved. The customer would then perform a deeper push that looks a lot like a "slightly" symmetrical operation of a deep fetch following a prior sparse fetch to supply the vendor with the specific commit(s). Perhaps I'm not understanding the subtleties of what you're describing, but could you do this with stock git functionality. Let the vendor publish a "well known branch" for the client. Let the client pull that and build. Let the client create a branch set to the same commit that they fetched. Let the client push that branch as a client-specific branch to the vendor to indicate that that is the official release they are based on. Then the vendor would know the official commit that the client was using. This is the easy part, and it doesn't require anything sparse to exist. If the client makes local changes, does the vendor really need the SHA of those -- without the actual content? I mean any SHA would do right? Perhaps let the client create a second client-specific branch (set to the same commit as the first) to indicate they had mods. Later, when the vendor needs the actual client changes, the client does a normal push to this 2nd client-specific branch at the vendor. This would send everything that the client has done to the code since the official release. What I should have added to the use-case was that there is a strong audit requirement (regulatory, actually) involved that the SHA is exact, immutable, and cannot be substitute or forged (one of the reasons git is in such high regard). So, no I can't arrange a fake SHA to represent a SHA to be named later. It SHA of the installed commit is part of the official record of what happened on the specific server, so I'm stuck with it. I'm not sure what you mean about "it is inside a tree". m---a---b---c---H1 `---d---H2 d would be at a head. b would be inside. Determining content of c is problematic if b is sparse, so I'm really unsure that any of this is possible. I think I get the jist of your use case. Would I be right that you don't have a true working solution yet? i.e. that it's a problem that is almost sorted but falls down at the last step. If one pretended that this was a single development shop, and the various vendors, clients and customers as being independent devolopers, each of whom is over protective of their code, it may give a better view that maps onto classic feature development diagrams. (i.e draw the answer for local devs, then mark where the splits happen) In particular, I think you could use a notional regulator's view that the whole code base is part of a large Git heirarchy of branches and merges, and that some of the feature loops are only available via the particular developer that worked on that feature. This would mean that from a regulatory overview there is a merge commit in the 'main' (master) heirachy that has the main and feature commits listed, and the feature commit is probably an --allow-empty commit (that has an empty tree if they are that paranoid) that says 'function X released' (and probably tagged), and that release commit then has, as its parent, the true release commit, with the true code tree. The latter commit isn't actually being shown to you! At this point the potential for using the graft capa
Re: bug deleting "unmerged" branch (2.12.3)
From: "Junio C Hamano" <gits...@pobox.com> "Philip Oakley" <philipoak...@iee.org> writes: I think it was that currently you are on M, and neither A nor B are ancestors (i.e. merged) of M. As Junio said:- "branch -d" protects branches that are yet to be merged to the **current branch**. Actually, I think people loosened this over time and removal of branch X is not rejected even if the range HEAD..X is not empty, as long as X is marked to integrate with/build on something else with branch.X.{remote,merge} and the range X@{upstream}..X is empty. So the stress of "current branch" above you added is a bit of a white lie. Ah, thanks. [I haven't had chance to check the code] The man page does say: .-d .Delete a branch. The branch must be fully merged in its upstream .branch, or in HEAD if no upstream was set with --track .or --set-upstream. It's whether or not Ulrich had joined the two aspects together, and if the doc was sufficient to help recognise the 'unmerged' issue. Ulrich? -- Philip
Re: [RFE] Inverted sparseness
From: "Randall S. Becker"Sent: Friday, December 01, 2017 6:31 PM On December 1, 2017 1:19 PM, Jeff Hostetler wrote: On 12/1/2017 12:21 PM, Randall S. Becker wrote: I recently encountered a really strange use-case relating to sparse clone/fetch that is really backwards from the discussion that has been going on, and well, I'm a bit embarrassed to bring it up, but I have no good solution including building a separate data store that will end up inconsistent with repositories (a bad solution). The use-case is as follows: Given a backbone of multiple git repositories spread across an organization with a server farm and upstream vendors. The vendor delivers code by having the client perform git pull into a specific branch. The customer may take the code as is or merge in customizations. The vendor wants to know exactly what commit of theirs is installed on each server, in near real time. The customer is willing to push the commit-ish to the vendor's upstream repo but does not want, by default, to share the actual commit contents for security reasons. Realistically, the vendor needs to know that their own commit id was put somewhere (process exists to track this, so not part of the use-case) and whether there is a subsequent commit contributed >by the customer, but the content is not relevant initially. After some time, the vendor may request the commit contents from the customer in order to satisfy support requirements - a.k.a. a defect was found but has to be resolved. The customer would then perform a deeper push that looks a lot like a "slightly" symmetrical operation of a deep fetch following a prior sparse fetch to supply the vendor with the specific commit(s). Perhaps I'm not understanding the subtleties of what you're describing, but could you do this with stock git functionality. Let the vendor publish a "well known branch" for the client. Let the client pull that and build. Let the client create a branch set to the same commit that they fetched. Let the client push that branch as a client-specific branch to the vendor to indicate that that is the official release they are based on. Then the vendor would know the official commit that the client was using. This is the easy part, and it doesn't require anything sparse to exist. If the client makes local changes, does the vendor really need the SHA of those -- without the actual content? I mean any SHA would do right? Perhaps let the client create a second client-specific branch (set to the same commit as the first) to indicate they had mods. Later, when the vendor needs the actual client changes, the client does a normal push to this 2nd client-specific branch at the vendor. This would send everything that the client has done to the code since the official release. What I should have added to the use-case was that there is a strong audit requirement (regulatory, actually) involved that the SHA is exact, immutable, and cannot be substitute or forged (one of the reasons git is in such high regard). So, no I can't arrange a fake SHA to represent a SHA to be named later. It SHA of the installed commit is part of the official record of what happened on the specific server, so I'm stuck with it. I'm not sure what you mean about "it is inside a tree". m---a---b---c---H1 `---d---H2 d would be at a head. b would be inside. Determining content of c is problematic if b is sparse, so I'm really unsure that any of this is possible. Cheers, Randall -- Brief whoami: NonStop developer since approximately UNIX(421664400)/NonStop(2112884442) -- In my real life, I talk too much. I think I get the jist of your use case. Would I be right that you don't have a true working solution yet? i.e. that it's a problem that is almost sorted but falls down at the last step. If one pretended that this was a single development shop, and the various vendors, clients and customers as being independent devolopers, each of whom is over protective of their code, it may give a better view that maps onto classic feature development diagrams. (i.e draw the answer for local devs, then mark where the splits happen) In particular, I think you could use a notional regulator's view that the whole code base is part of a large Git heirarchy of branches and merges, and that some of the feature loops are only available via the particular developer that worked on that feature. This would mean that from a regulatory overview there is a merge commit in the 'main' (master) heirachy that has the main and feature commits listed, and the feature commit is probably an --allow-empty commit (that has an empty tree if they are that paranoid) that says 'function X released' (and probably tagged), and that release commit then has, as its parent, the true release commit, with the true code tree. The latter commit isn't actually being shown to you! At this point the potential for using
Re: Re: Unify annotated and non-annotated tags
From: "anatoly techtonik"comment at end - Philip On Fri, Nov 24, 2017 at 1:24 PM, Ævar Arnfjörð Bjarmason wrote: On Fri, Nov 24, 2017 at 10:52 AM, anatoly techtonik wrote: On Thu, Nov 23, 2017 at 6:08 PM, Randall S. Becker wrote: On 2017-11-23 02:31 (GMT-05:00) anatoly techtonik wrote Subject: Re: Unify annotated and non-annotated tags On Sat, Nov 11, 2017 at 5:06 AM, Junio C Hamano wrote: Igor Djordjevic writes: If you would like to mimic output of "git show-ref", repeating commits for each tag pointing to it and showing full tag name as well, you could do something like this, for example: for tag in $(git for-each-ref --format="%(refname)" refs/tags) do printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag" done Hope that helps a bit. If you use for-each-ref's --format option, you could do something like (pardon a long line): git for-each-ref --format='%(if)%(*objectname)%(then)%(*objectname)%(else)%(objectname)%(end) %(refname)' refs/tags without any loop, I would think. Thanks. That helps. So my proposal is to get rid of non-annotated tags, so to get all tags with commits that they point to, one would use: git for-each-ref --format='%(*objectname) %(refname)' refs/tags> For so-called non-annotated tags just leave the message empty. I don't see why anyone would need non-annotated tags though. I have seen non-annotated tags used in automations (not necessarily well written ones) that create tags as a record of automation activity. I am not sure we should be writing off the concept of unannotated tags entirely. This may cause breakage based on existing expectations of how tags work at present. My take is that tags should include whodunnit, even if it's just the version of the automation being used, but I don't always get to have my wishes fulfilled. In essence, whatever behaviour a non-annotated tag has now may need to be emulated in future even if reconciliation happens. An option to preserve empty tag compatibility with pre-2.16 behaviour, perhaps? Sadly, I cannot supply examples of this usage based on a human memory page-fault and NDAs. Are there any windows for backward compatibility breaks, or git is doomed to preserve it forever? Automation without support won't survive for long, and people who rely on that, like Chromium team, usually hard set the version used. Git is not doomed to preserve anything forever. We've gradually broken backwards compatibility for a few core things like these. However, just as a bystander reading this thread I haven't seen any compelling reason for why these should be removed. You initially had questions about how to extract info about them, which you got answers to. So what reasons remain for why they need to be removed? To reduce complexity and prior knowledge when dealing with Git tags. For example, http://readthedocs.io/ site contains a lot of broken "Edit on GitHub" links, for example - http://git-memo.readthedocs.io/en/stable/ And it appeared that the reason for that is discrepancy between git annotated and non-annotated tags. The pull request that fixes the issue after it was researched and understood is simple https://github.com/rtfd/readthedocs.org/pull/3302 However, while looking through linked issues and PRs, one can try to imagine how many days it took for people to come up with the solution, which came from this thread. -- anatoly t. So if I understand correctly, the hope is that `git show-ref --tags` could get an alternate option `--all-tags` [proper option name required...] such that the user would not have to develop the rather over the complicated expression that used a newish capability of a different command. Would that be right? Or at least update the man page docs to clarify the annotated vs non-annotated tags issue (many SO questions!). And indicate if the --dereference and/or --hash options would do the trick! - maybe the "^{}" appended would be part of the problem (and need that new option "--objectreference" ). Philip
Re: Re: bug deleting "unmerged" branch (2.12.3)
From: "Ulrich Windl"To: Cc: Sent: Wednesday, November 29, 2017 8:32 AM Subject: Antw: Re: bug deleting "unmerged" branch (2.12.3) "Ulrich Windl" writes: I think if more than one branches are pointing to the same commit, one should be allowed to delete all but the last one without warning. Do you agree? That comes from a viewpoint that the only purpose "branch -d" exists in addition to "branch -D" is to protect objects from "gc". Those who added the safety feature may have shared that view originally, but it turns out that it protects another important thing you are forgetting. Imagine that two topics, 'topicA' and 'topicB', were independently forked from 'master', and then later we wanted to add a feature that depends on these two topics. Since the 'feature' forked, there may have been other developments, and we ended up in this topology: ---o---o---o---o---o---M \ \ \ o---A---o---F \ / o---o---o---o---B where A, B and F are the tips of 'topicA', 'topicB' and 'feature' branches right now [*1*]. Now imagine we are on 'master' and just made 'topicB' graduate. We would have this topology. ---o---o---o---o---o---o---M \ \ / \ o---A---o---F / \ / / o---o---o---o---B While we do have 'topicA' and 'feature' branches still in flight, we are done with 'topicB'. Even though the tip of 'topicA' is reachable from the tip of 'feature', the fact that the branch points at 'A' is still relevant. If we lose that information right now, we'd have to go find it when we (1) want to further enhance the topic by checking out and building on 'topicA', and (2) want to finally get 'topicA' graduate to 'master'. Because removal of a topic (in this case 'topicB') is often done after a merge of that topic is made into an integration branch, "branch -d" that protects branches that are yet to be merged to the current branch catches you if you said "branch -d topic{A,B}" (or other equivalent forms, most likely you'd have a script that spits out list of branches and feed it to "xargs branch -d"). So, no, I do not agree. Hi! I can follow your argumentation, but I fail to see that your branches A and B point to the same commit (which is what I was talking about). So my situation would be: o---oA,B I still think I could safely remove either A or B, even when the branch (identified by the commit, not by the name) is unmerged. What did I miss? I think it was that currently you are on M, and neither A nor B are ancestors (i.e. merged) of M. As Junio said:- "branch -d" protects branches that are yet to be merged to the **current branch**. [I said the same in another part of the thread. The question now would be what needs changing? the error/warning message, the docs, something else?] Regards, Ulrich [Footnotes] *1* Since the 'feature' started developing, there were a few commits added to 'topicB' but because the feature does not depend on these enhancements to that topic, B is ahead of the commit that was originally merged with the tip of 'topicA' to form the 'feature' branch.
Re: Antw: Re: bug deleting "unmerged" branch (2.12.3)
Hi Ulrich From: "Johannes Schindelin"To: "Ulrich Windl" Cc: Sent: Wednesday, November 29, 2017 12:27 PM Subject: Re: Antw: Re: bug deleting "unmerged" branch (2.12.3) Hi Ulrich, On Wed, 29 Nov 2017, Ulrich Windl wrote: > On Tue, 28 Nov 2017, Ulrich Windl wrote: > >> During a rebase that turned out to be heavier than expected 8-( I >> decided to keep the old branch by creating a temporary branch name to >> the commit of the branch to rebase (which was still the old commit ID >> at that time). >> >> When done rebasing, I attached a new name to the new (rebased) >> branch, deleted the old name (pointing at the same rebase commit), >> then recreated the old branch from the temporary branch name (created >> to remember the commit id). >> >> When I wanted to delete the temporary branch (which is of no use >> now), I got a message that the branch is unmerged. > > This is actually as designed, at least for performance reasons (it is > not exactly cheap to figure out whether a given commit is contained in > any other branch). > >> I think if more than one branches are pointing to the same commit, >> one should be allowed to delete all but the last one without warning. >> Do you agree? > > No, respectfully disagree, because I have found myself with branches > pointing to the same commit, even if the branches served different > purposes. I really like the current behavior where you can delete a > branch with `git branch -d` as long as it is contained in its upstream > branch. I'm not talking about the intention of a branch, but of the state of a branch: If multiple branches point (not "contain") the same commit, they are equivalent (besides the name) at that moment. I did a poor job of explaining myself, please let me try again. I'll give you one concrete example: Recently, while working on some topic, I stumbled over a bug and committed a bug fix, then committed that and branched off a new branch to remind myself to rebase the bug fix and contribute it. At that point, those branches were at the same revision, but distinctly not equivalent (except in just one, very narrow sense of the word, which I would argue is the wrong interpretation in this context). Sadly, I was called away at that moment to take care of something completely different. Even if I had not been, the worktree with the first branch would still have been at that revision for a longer time, as I had to try out a couple of changes before I could commit. This is just one example where the idea backfires that you can safely delete one of two branches that happen to point at the same commit at the same time. I am sure that you possess vivid enough of an imagination to come up with plenty more examples where that is the case. As no program can predict the future or the intentions of the user, it should be safe to delete the branch, because it can easily be recreated (from the remaining branches pointing to the same commit). Yes, no program can predict the future (at least *accurately*). No, it is not safe to delete that branch. Especially if you take the current paradigm of "it is safe to delete a branch if it is up-to-date with, or at least fast-forwardable to, its upstream branch" into account. And no, a branch cannot easily be recreated from the remaining branches in the future, as branches can have different reflogs (and they are lost when deleting the branch). It shouldn't need a lot of computational power to find out when multiple branches point to the same commit. Sure, that test can even be scripted easily by using the `git for-each-ref --points-at=` command. By the way, if you are still convinced that my argument is flawed and that it should be considered safe to delete a branch if any other branch points to the same revision, I encourage you to work on a patch to make it so. For maximum chance of getting included, you would want to guard this behind a new config setting, say, branch.deleteRedundantIsSafe, parse it here: https://github.com/git/git/blob/v2.15.1/config.c#L1260-L1288 or here: https://github.com/git/git/blob/v2.15.1/builtin/branch.c#L78-L97 I'd agree that it is easy to misinterpret the message. After close reading of the thread, Junio put his finger on the scenario with: - "branch -d" protects branches that are yet to be merged to the **current** branch. (my emphasis) Maybe the error message could say that (what exactly was the error message?), or the documenation be improved to clarify. document it here: https://github.com/git/git/blob/v2.15.1/Documentation/git-branch.txt and here: https://github.com/git/git/blob/v2.15.1/Documentation/config.txt#L969 and handle it here: https://github.com/git/git/blob/v2.15.1/builtin/branch.c#L185-L288 (look for the places where `force` is used, likely just before the call to `check_branch_commit()`). The way you'd want it to handle is most lilkely by
Re: [add-default-config] add --default option to git config.
From: "Soukaina NAIT HMID"From: Soukaina NAIT HMID From a coursory read, there does need a bit more explanation. I see you also add a --color description and code, and don't say what the problem being solved is. If it is trickty to explain, then a two patch series may tease apart the issues. perhaps add the --color option first (noting you'll use it in the next patch), then a second patch that explains about the --default problem. The patch title should be something like "[PATCH 1/n] config: add --default option" You may also want to explain the test rationale, and maybe split them if appropriate. -- Philip Signed-off-by: Soukaina NAIT HMID --- Documentation/git-config.txt | 4 ++ builtin/config.c | 34 - config.c | 10 +++ config.h | 1 + t/t1300-repo-config.sh | 161 +++ 5 files changed, 209 insertions(+), 1 deletion(-) diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt index 4edd09fc6b074..5d5cd58fdae37 100644 --- a/Documentation/git-config.txt +++ b/Documentation/git-config.txt @@ -179,6 +179,10 @@ See also <>. specified user. This option has no effect when setting the value (but you can use `git config section.variable ~/` from the command line to let your shell do the expansion). +--color:: + Find the color configured for `name` (e.g. `color.diff.new`) and + output it as the ANSI color escape sequence to the standard + output. -z:: --null:: diff --git a/builtin/config.c b/builtin/config.c index d13daeeb55927..5e5b998b7c892 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -30,6 +30,7 @@ static int end_null; static int respect_includes_opt = -1; static struct config_options config_options; static int show_origin; +static const char *default_value; #define ACTION_GET (1<<0) #define ACTION_GET_ALL (1<<1) @@ -52,6 +53,8 @@ static int show_origin; #define TYPE_INT (1<<1) #define TYPE_BOOL_OR_INT (1<<2) #define TYPE_PATH (1<<3) +#define TYPE_COLOR (1<<4) + static struct option builtin_config_options[] = { OPT_GROUP(N_("Config file location")), @@ -80,11 +83,13 @@ static struct option builtin_config_options[] = { OPT_BIT(0, "int", , N_("value is decimal number"), TYPE_INT), OPT_BIT(0, "bool-or-int", , N_("value is --bool or --int"), TYPE_BOOL_OR_INT), OPT_BIT(0, "path", , N_("value is a path (file or directory name)"), TYPE_PATH), + OPT_BIT(0, "color", , N_("find the color configured"), TYPE_COLOR), OPT_GROUP(N_("Other")), OPT_BOOL('z', "null", _null, N_("terminate values with NUL byte")), OPT_BOOL(0, "name-only", _values, N_("show variable names only")), OPT_BOOL(0, "includes", _includes_opt, N_("respect include directives on lookup")), OPT_BOOL(0, "show-origin", _origin, N_("show origin of config (file, standard input, blob, command line)")), + OPT_STRING(0, "default", _value, N_("default-value"), N_("sets default value when no value is returned from config")), OPT_END(), }; @@ -159,6 +164,13 @@ static int format_config(struct strbuf *buf, const char *key_, const char *value return -1; strbuf_addstr(buf, v); free((char *)v); + } + else if (types == TYPE_COLOR) { + char *v = xmalloc(COLOR_MAXLEN); + if (git_config_color(, key_, value_) < 0) + return -1; + strbuf_addstr(buf, v); + free((char *)v); } else if (value_) { strbuf_addstr(buf, value_); } else { @@ -244,8 +256,16 @@ static int get_value(const char *key_, const char *regex_) config_with_options(collect_config, , _config_source, _options); - ret = !values.nr; + if (!values.nr && default_value && types) { + struct strbuf *item; + ALLOC_GROW(values.items, values.nr + 1, values.alloc); + item = [values.nr++]; + if(format_config(item, key_, default_value) < 0){ + values.nr = 0; + } + } + ret = !values.nr; for (i = 0; i < values.nr; i++) { struct strbuf *buf = values.items + i; if (do_all || i == values.nr - 1) @@ -268,6 +288,7 @@ static int get_value(const char *key_, const char *regex_) return ret; } + static char *normalize_value(const char *key, const char *value) { if (!value) @@ -281,6 +302,17 @@ static char *normalize_value(const char *key, const char *value) * when retrieving the value. */ return xstrdup(value); + if (types == TYPE_COLOR) + { + char *v = xmalloc(COLOR_MAXLEN); + if (git_config_color(, key, value) == 0) + { + free((char *)v); + return xstrdup(value); + } + free((char *)v); + die("cannot parse color '%s'", value); + } if (types == TYPE_INT) return xstrfmt("%"PRId64, git_config_int64(key, value)); if (types == TYPE_BOOL) diff --git a/config.c b/config.c index 903abf9533b18..5c5daffeb6723 100644 --- a/config.c +++ b/config.c @@ -16,6 +16,7 @@ #include "string-list.h" #include "utf8.h" #include "dir.h" +#include "color.h" struct config_source { struct config_source *prev; @@ -990,6 +991,15 @@ int git_config_pathname(const char **dest, const
Re: How hard would it be to implement sparse fetching/pulling?
From: "Jeff Hostetler" <g...@jeffhostetler.com> Sent: Friday, December 01, 2017 5:23 PM On 11/30/2017 6:43 PM, Philip Oakley wrote: From: "Vitaly Arbuzov" <v...@uber.com> [...] comments below.. On Thu, Nov 30, 2017 at 9:01 AM, Vitaly Arbuzov <v...@uber.com> wrote: Hey Jeff, It's great, I didn't expect that anyone is actively working on this. I'll check out your branch, meanwhile do you have any design docs that describe these changes or can you define high level goals that you want to achieve? On Thu, Nov 30, 2017 at 6:24 AM, Jeff Hostetler <g...@jeffhostetler.com> wrote: On 11/29/2017 10:16 PM, Vitaly Arbuzov wrote: [...] I have, for separate reasons been _thinking_ about the issue ($dayjob is in defence, so a similar partition would be useful). The changes would almost certainly need to be server side (as well as client side), as it is the server that decides what is sent over the wire in the pack files, which would need to be a 'narrow' pack file. Yes, there will need to be both client and server changes. In the current 3 part patch series, the client sends a "filter_spec" to the server as part of the fetch-pack/upload-pack protocol. If the server chooses to honor it, upload-pack passes the filter_spec to pack-objects to build an "incomplete" packfile omitting various objects (currently blobs). Proprietary servers will need similar changes to support this feature. Discussing this feature in the context of the defense industry makes me a little nervous. (I used to be in that area.) I'm viewing the desire for codebase partitioning from a soft layering of risk view (perhaps a more UK than USA approach ;-) What we have in the code so far may be a nice start, but probably doesn't have the assurances that you would need for actual deployment. But it's a start True. I need to get some of my collegues more engaged... If we had such a feature then all we would need on top is a separate tool that builds the right "sparse" scope for the workspace based on paths that developer wants to work on. In the world where more and more companies are moving towards large monorepos this improvement would provide a good way of scaling git to meet this demand. The 'companies' problem is that it tends to force a client-server, always-on on-line mentality. I'm also wanting the original DVCS off-line capability to still be available, with _user_ control, in a generic sense, of what they have locally available (including files/directories they have not yet looked at, but expect to have. IIUC Jeff's work is that on-line view, without the off-line capability. I'd commented early in the series at [1,2,3]. Yes, this does tend to lead towards an always-online mentality. However, there are 2 parts: [a] dynamic object fetching for missing objects, such as during a random command like diff or blame or merge. We need this regardless of usage -- because we can't always predict (or dry-run) every command the user might run in advance. Making something "useful" happen here when off-line is an obvious goal. [b] batch fetch mode, such as using partial-fetch to match your sparse-checkout so that you always have the blobs of interest to you. And assuming you don't wander outside of this subset of the tree, you should be able to work offline as usual. If you can work within the confines of [b], you wouldn't need to always be online. I feel this is the area that does need ensure a capability to avoid any perception of the much maligned 'Embrace, extend, and extinguish' by accidental lockout. I don't think this should be viewed as a type of sparse checkout - it's just a checkout of what you have (under the hood it could use the same code though). We might also add a part [c] with explicit commands to back-fill or alter your incomplete view of the ODB (as I explained in response to the "git diff " comment later in this thread. At its core, my idea was to use the object store to hold markers for the 'not yet fetched' objects (mainly trees and blobs). These would be in a known fixed format, and have the same effect (conceptually) as the sub-module markers - they _confirm_ the oid, yet say 'not here, try elsewhere'. We do have something like this. Jonathan can explain better than I, but basically, we denote possibly incomplete packfiles from partial clones and fetches as "promisor" and have special rules in the code to assert that a missing blob referenced from a "promisor" packfile is OK and can be fetched later if necessary from the "promising" remote. The remote interaction is one area that may need thought, especially in a triangle workflow, of which there are a few. The main problem with markers or other lists of missing objects is that it has scale problems for large repos. Suppose I have 100M blobs in my repo
Re: How hard would it be to implement sparse fetching/pulling?
Hi Jonathan, Thanks for the outline. It has help clarify some points and see the very similar alignments. The one thing I wasn't clear about is the "promised" objects/remote. Is that "promisor" remote a fixed entity, or could it be one of many remotes that could be a "provider"? (sort of like fetching sub-modules...) Philip From: "Jonathan Nieder"Sent: Friday, December 01, 2017 2:51 AM Hi Vitaly, Vitaly Arbuzov wrote: I think it would be great if we high level agree on desired user experience, so let me put a few possible use cases here. I think one thing this thread is pointing to is a lack of overview documentation about how the 'partial clone' series currently works. The basic components are: 1. extending git protocol to (1) allow fetching only a subset of the objects reachable from the commits being fetched and (2) later, going back and fetching the objects that were left out. We've also discussed some other protocol changes, e.g. to allow obtaining the sizes of un-fetched objects without fetching the objects themselves 2. extending git's on-disk format to allow having some objects not be present but only be "promised" to be obtainable from a remote repository. When running a command that requires those objects, the user can choose to have it either (a) error out ("airplane mode") or (b) fetch the required objects. It is still possible to work fully locally in such a repo, make changes, get useful results out of "git fsck", etc. It is kind of similar to the existing "shallow clone" feature, except that there is a more straightforward way to obtain objects that are outside the "shallow" clone when needed on demand. 3. improving everyday commands to require fewer objects. For example, if I run "git log -p", then I way to see the history of most files but I don't necessarily want to download large binary files just to print 'Binary files differ' for them. And by the same token, we might want to have a mode for commands like "git log -p" to default to restricting to a particular directory, instead of downloading files outside that directory. There are some fundamental changes to make in this category --- e.g. modifying the index format to not require entries for files outside the sparse checkout, to avoid having to download the trees for them. The overall goal is to make git scale better. The existing patches do (1) and (2), though it is possible to do more in those categories. :) We have plans to work on (3) as well. These are overall changes that happen at a fairly low level in git. They mostly don't require changes command-by-command. Thanks, Jonathan
Re: How hard would it be to implement sparse fetching/pulling?
From: "Jeff Hostetler"Sent: Friday, December 01, 2017 2:30 PM On 11/30/2017 8:51 PM, Vitaly Arbuzov wrote: I think it would be great if we high level agree on desired user experience, so let me put a few possible use cases here. 1. Init and fetch into a new repo with a sparse list. Preconditions: origin blah exists and has a lot of folders inside of src including "bar". Actions: git init foo && cd foo git config core.sparseAll true # New flag to activate all sparse operations by default so you don't need to pass options to each command. echo "src/bar" > .git/info/sparse-checkout git remote add origin blah git pull origin master Expected results: foo contains src/bar folder and nothing else, objects that are unrelated to this tree are not fetched. Notes: This should work same when fetch/merge/checkout operations are used in the right order. With the current patches (parts 1,2,3) we can pass a blob-ish to the server during a clone that refers to a sparse-checkout specification. I hadn't appreciated this capability. I see it as important, and should be available both ways, so that a .gitNarrow spec can be imposed from the server side, as well as by the requester. It could also be used to assist in the 'precious/secret' blob problem, so that AWS keys are never pushed, nor available for fetching! There's a bit of a chicken-n-egg problem getting things set up. So if we assume your team would create a series of "known enlistments" under version control, then you could s/enlistments/entitlements/ I presume? just reference one by : during your clone. The server can lookup that blob and just use it. git clone --filter=sparse:oid=master:templates/bar URL And then the server will filter-out the unwanted blobs during the clone. (The current version only filters blobs; you still get full commits and trees. That will be revisited later.) I'm for the idea that only the in-heirachy trees should be sent. It should also be possible that the server replies that it is only sending a narrow clone, with the given (accessible?) spec. On the client side, the partial clone installs local config settings into the repo so that subsequent fetches default to the same filter criteria as used in the clone. I don't currently have provision to send a full sparse-checkout specification to the server during a clone or fetch. That seemed like too much to try to squeeze into the protocols. We can revisit this later if there is interest, but it wasn't critical for the initial phase. Agreed. I think it should be somewhere 'visible' to the user, but could be setup by the server admin / repo maintainer if they don't have write access. But there could still be the catch-22 - maybe one starts with a toptree> : pair to define an origin point (it's not as refined as a .gitNarrow spec file, but is definative). The toptree option could even allow sub-tree clones.. maybe.. 2. Add a file and push changes. Preconditions: all steps above followed. touch src/bar/baz.txt && git add -A && git commit -m "added a file" git push origin master Expected results: changes are pushed to remote. I don't believe partial clone and/or partial fetch will cause any changes for push. I suspect that pushes could be rejected if the user 'pretends' to modify files or trees outside their area. It does need the user to be able to spoof part of a tree they don't have, so an upstream / remote would immediatly know it was a spoof but locally the narrow clone doesn't have enough detail about the 'bad' oid. It would be right to reject such attempts! 3. Clone a repo with a sparse list as a filter. Preconditions: same as for #1 Actions: echo "src/bar" > /tmp/blah-sparse-checkout git clone --sparse /tmp/blah-sparse-checkout blah # Clone should be the only command that would requires specific option key being passed. Expected results: same as for #1 plus /tmp/blah-sparse-checkout is copied into .git/info/sparse-checkout I presume clone and fetch are treated equivalently here. There are 2 independent concepts here: clone and checkout. Currently, there isn't any automatic linkage of the partial clone to the sparse-checkout settings, so you could do something like this: I see an implicit link that clearly one cannot checkout (inflate/populate) a file/directory that one does not have in the object store. But that does not imply the reverse linkage. The regular sparse checkout should be available independently of the local clone being a narrow one. git clone --no-checkout --filter=sparse:oid=master:templates/bar URL git cat-file ... templates/bar >.git/info/sparse-checkout git config core.sparsecheckout true git checkout ... I've been focused on the clone/fetch issues and have not looked into the automation to couple them. I foresee that large files and certain files need to be filterable for fetch-clone, and that might not be (backward) compatible with the
Re: How hard would it be to implement sparse fetching/pulling?
From: "Vitaly Arbuzov" <v...@uber.com> Sent: Friday, December 01, 2017 1:27 AM Jonathan, thanks for references, that is super helpful, I will follow your suggestions. Philip, I agree that keeping original DVCS off-line capability is an important point. Ideally this feature should work even with remotes that are located on the local disk. And with other any other remote. (even to the extent that the other remote may indicate it has no capability, sorry, go away..) E.g. One ought to be able to have/create a Github narrow fork of only the git.git/Documenation repo, and interact with that. (how much nicer if it was git.git/Documenation/ManPages/ to ease the exclusion of RelNotes/, howto/ and technical/ ) Which part of Jeff's work do you think wouldn't work offline after repo initialization is done and sparse fetch is performed? All the stuff that I've seen seems to be quite usable without GVFS. I think it's that initial download that may be different, and what is expected of it. In my case, one may never connect to that server again, yet still be able to work both off-line and with other remotes (push and pull as per capabilities). Below I note that I'd only fetch the needed trees, not all of them. Also one needs to fetch a complete (pre-defined) subset, rather than an on-demand subset. I'm not sure if we need to store markers/tombstones on the client, what problem does it solve? The part that the markers hopes to solve is the part that I hadn't said, that they should also show in the work tree so that users can see what is missing and where. Importantly I would also trim the directory (tree) structure so only the direct heirachy of those files the user sees are visible, though at each level they would see side directory names (which are embedded in the heirachical tree objects). (IIUC Jeff H's scheme downloads *all* trees, not just a few) It would mean that users can create a complete fresh tree and commit that can be merged and picked onto the usptream tree from the _directory worktree alone_, because the oid's of all the parts are listed in the worktree. The actual objects for the missing oids being available in the appropriate upstream. It also means the index can be deleted, and with only the local narrow pack files and the current worktree the index can be recreated at the current sparseness level. (I'm hoping I've understood the dispersement of data between index and narrow packs corrrectly here ;-) -- Philip On Thu, Nov 30, 2017 at 3:43 PM, Philip Oakley <philipoak...@iee.org> wrote: From: "Vitaly Arbuzov" <v...@uber.com> Found some details here: https://github.com/jeffhostetler/git/pull/3 Looking at commits I see that you've done a lot of work already, including packing, filtering, fetching, cloning etc. What are some areas that aren't complete yet? Do you need any help with implementation? comments below.. On Thu, Nov 30, 2017 at 9:01 AM, Vitaly Arbuzov <v...@uber.com> wrote: Hey Jeff, It's great, I didn't expect that anyone is actively working on this. I'll check out your branch, meanwhile do you have any design docs that describe these changes or can you define high level goals that you want to achieve? On Thu, Nov 30, 2017 at 6:24 AM, Jeff Hostetler <g...@jeffhostetler.com> wrote: On 11/29/2017 10:16 PM, Vitaly Arbuzov wrote: Hi guys, I'm looking for ways to improve fetch/pull/clone time for large git (mono)repositories with unrelated source trees (that span across multiple services). I've found sparse checkout approach appealing and helpful for most of client-side operations (e.g. status, reset, commit, etc.) The problem is that there is no feature like sparse fetch/pull in git, this means that ALL objects in unrelated trees are always fetched. It may take a lot of time for large repositories and results in some practical scalability limits for git. This forced some large companies like Facebook and Google to move to Mercurial as they were unable to improve client-side experience with git while Microsoft has developed GVFS, which seems to be a step back to CVCS world. I want to get a feedback (from more experienced git users than I am) on what it would take to implement sparse fetching/pulling. (Downloading only objects related to the sparse-checkout list) Are there any issues with missing hashes? Are there any fundamental problems why it can't be done? Can we get away with only client-side changes or would it require special features on the server side? I have, for separate reasons been _thinking_ about the issue ($dayjob is in defence, so a similar partition would be useful). The changes would almost certainly need to be server side (as well as client side), as it is the server that decides what is sent over the wire in the pack files, which would need to be a 'narrow' pack file. If we had such a feature then all we would need on top is a separate tool that build
Re: How hard would it be to implement sparse fetching/pulling?
From: "Vitaly Arbuzov"Found some details here: https://github.com/jeffhostetler/git/pull/3 Looking at commits I see that you've done a lot of work already, including packing, filtering, fetching, cloning etc. What are some areas that aren't complete yet? Do you need any help with implementation? comments below.. On Thu, Nov 30, 2017 at 9:01 AM, Vitaly Arbuzov wrote: Hey Jeff, It's great, I didn't expect that anyone is actively working on this. I'll check out your branch, meanwhile do you have any design docs that describe these changes or can you define high level goals that you want to achieve? On Thu, Nov 30, 2017 at 6:24 AM, Jeff Hostetler wrote: On 11/29/2017 10:16 PM, Vitaly Arbuzov wrote: Hi guys, I'm looking for ways to improve fetch/pull/clone time for large git (mono)repositories with unrelated source trees (that span across multiple services). I've found sparse checkout approach appealing and helpful for most of client-side operations (e.g. status, reset, commit, etc.) The problem is that there is no feature like sparse fetch/pull in git, this means that ALL objects in unrelated trees are always fetched. It may take a lot of time for large repositories and results in some practical scalability limits for git. This forced some large companies like Facebook and Google to move to Mercurial as they were unable to improve client-side experience with git while Microsoft has developed GVFS, which seems to be a step back to CVCS world. I want to get a feedback (from more experienced git users than I am) on what it would take to implement sparse fetching/pulling. (Downloading only objects related to the sparse-checkout list) Are there any issues with missing hashes? Are there any fundamental problems why it can't be done? Can we get away with only client-side changes or would it require special features on the server side? I have, for separate reasons been _thinking_ about the issue ($dayjob is in defence, so a similar partition would be useful). The changes would almost certainly need to be server side (as well as client side), as it is the server that decides what is sent over the wire in the pack files, which would need to be a 'narrow' pack file. If we had such a feature then all we would need on top is a separate tool that builds the right "sparse" scope for the workspace based on paths that developer wants to work on. In the world where more and more companies are moving towards large monorepos this improvement would provide a good way of scaling git to meet this demand. The 'companies' problem is that it tends to force a client-server, always-on on-line mentality. I'm also wanting the original DVCS off-line capability to still be available, with _user_ control, in a generic sense, of what they have locally available (including files/directories they have not yet looked at, but expect to have. IIUC Jeff's work is that on-line view, without the off-line capability. I'd commented early in the series at [1,2,3]. At its core, my idea was to use the object store to hold markers for the 'not yet fetched' objects (mainly trees and blobs). These would be in a known fixed format, and have the same effect (conceptually) as the sub-module markers - they _confirm_ the oid, yet say 'not here, try elsewhere'. The comaprison with submodules mean there is the same chance of de-synchronisation with triangular and upstream servers, unless managed. The server side, as noted, will need to be included as it is the one that decides the pack file. Options for a server management are: - "I accept narrow packs?" No; yes - "I serve narrow packs?" No; yes. - "Repo completeness checks on reciept": (must be complete) || (allow narrow to nothing). For server farms (e.g. Github..) the settings could be global, or by repo. (note that the completeness requirement and narrow reciept option are not incompatible - the recipient server can reject the pack from a narrow subordinate as incomplete - see below) * Marking of 'missing' objects in the local object store, and on the wire. The missing objects are replaced by a place holder object, which used the same oid/sha1, but has a short fixed length, with content “GitNarrowObject ”. The chance that that string would actually have such an oid clash is the same as all other object hashes, so is a *safe* self-referential device. * The stored object already includes length (and inferred type), so we do know what it stands in for. Thus the local index (index file) should be able to be recreated from the object store alone (including the ‘promised / narrow / missing’ files/directory markers) * the ‘same’ as sub-modules. The potential for loss of synchronisation with a golden complete repo is just the same as for sub-modules. (We expected object/commit X here, but it’s not in the store). This could happen with a small user group who have locally narrow clones, who interact with their
Re: [PATCH v3 5/5] Testing: provide tests requiring them with ellipses after SHA-1 values
From: "Junio C Hamano" <gits...@pobox.com> "Philip Oakley" <philipoak...@iee.org> writes: From: "Junio C Hamano" <gits...@pobox.com> Ann T Ropea <bedhan...@gmx.de> writes: *1* We are being overly generous in t4013-diff-various.sh because we do not want to destroy/take apart the here-document. Given that all this a temporary measure, we should get away with it. So, the need to reformat the test for the future post-deprecation period is being deferred to the time that the PRINT_SHA1_ELLIPSIS env variable, and all ellipis, is removed - is that the case? Maybe it just needs saying plainly. And if we say it that way, it is clear that with this series, we are shipping a new feature with a test that does not protect the output format we claim to be the improved and preferred one. That sounds quite bad. Having said that, I have already queued this to 'pu' and I do not terribly mind to merge it down to 'next', leaving the test updates to cover the new output format as well as the backward compatible one at the same time for a later follow-up patch. I'd agree. I just wanted to ensure that I had the right understanding. I'd however hate it if I have to carry the topic in the current shape in 'next' forever, waiting for such an update to come, that may never materialize, and be forced to do it myself without being explicitly asked by (and thanked for) anybody, especially because this is not exactly my itch X-<. True. Or is the env variable being retained as a fallback 'forever'? I'm half guessing that it may tend toward the latter as it's an easier backward compatibility decision. We do not know until this change is released to the wild, at which time we will hear noises about the lack of expected ellipses their (poorly written) scripts rely on and tell them to set the workaround environment variable. We may not hear from such people at all, in which case we may be able to remove it within a year or so, but it is too early to tell. I was wondering if there should be a small documentation change for the env variable and states that it is a temporary measure for short term compatibility. Though I'm not sure where the 'right' place would be for it.
Re: [PATCH] git-send-email: fix get_maintainer.pl regression
From: "Eric Sunshine"On Sat, Nov 18, 2017 at 9:54 PM, Eric Sunshine wrote: On Thu, Nov 16, 2017 at 10:48 AM, Alex Bennée wrote: +test_expect_success $PREREQ 'cc trailer with get_maintainer output' ' + [...] + git send-email -1 --to=recipi...@example.com \ + --cc-cmd="$(pwd)/expected-cc-script.sh" \ + [...] +' OK I'm afraid I don't fully understand the test harness as this breaks a bunch of other tests. If anyone can offer some pointers on how to fix I'd be grateful. There are several problems: [...] * The directory in which the expected-cc-script.sh is created contains a space; this is intentional to catch bugs in tests and Git itself. In this case, your test is exposing what might be considered a bug in git-send-email itself, in which it invokes the --cc-cmd as "/path/with space/expected-cc-script.sh", which is interpreted as trying to invoke program "/path/with" with argument "space/expected-cc-script.sh". One > fix (which you could submit as a preparatory patch, making this a > 2-patch series) would be this: > > --- 8< --- > diff --git a/git-send-email.perl b/git-send-email.perl > @@ -1724,7 +1724,7 @@ sub recipients_cmd { > -open my $fh, "-|", "$cmd \Q$file\E" > + open my $fh, "-|", "\Q$cmd\E \Q$file\E" > --- 8< --- > > However, it's possible that might break existing users who rely on > --cc-cmd="myscript --option arg" working. It's not clear which > behavior is correct. The more I think about this, the less I consider this a bug in git-send-email. As noted, people might legitimately use a complex command (--cc-cmd="myscript--option arg"), so changing git-send-email to treat cc-cmd as an atomic string seems like a bad idea. A while back I proposed some documentation updates https://public-inbox.org/git/1437416790-5792-1-git-send-email-philipoak...@iee.org/ regarding what is (should be) allowed in the cc-cmd etc., and at the time Junio suggested that possible existing uses of the current code would be abuses. I didn't pursue it further, but it may be useful guidance here as to potential real world command lines.. Assuming no changes to git-send-email, to get your test working, you could try to figure out how to quote the script's path you're specifying with --cc-cmd, however, even easier would be to drop $(pwd) altogether. That is, instead of: --cc-cmd="$(pwd)/expected-cc-script.sh" just use: --cc-cmd=./expected-cc-script.sh
Re: [PATCH 6/7] builtin/describe.c: describe a blob
From: "Philip Oakley" <philipoak...@iee.org> s/with/without/ ... From: "Junio C Hamano" <gits...@pobox.com> : Friday, November 10, 2017 1:24 AM [catch up] "Philip Oakley" <philipoak...@iee.org> writes: From: "Stefan Beller" <sbel...@google.com> Rereading this discussion, there is currently no urgent thing to address? True. Then the state as announced by the last cooking email, to just cook it, seems about right and we'll wait for further feedback. A shiny new toy that is not a fix for a grave bug is rarely urgent, so with that criterion, we'd end up with hundreds of topics not in 'next' but in 'pu' waiting for the original contributor to get out of his or her procrastination, which certainly is not what I want to see, as I'd have to throw them into the Stalled bin and then eventually discard them, while having to worry about possible mismerges with remaining good topics caused by these topics appearing and disappearing from 'pu'. I'd rather see any topic that consumed reviewers' time to be polished enough to get into 'next' while we all recall the issues raised during previous reviews. I consider the process to further incrementally polish it after that happens a true "cooking". For this topic, aside from "known issues" that we decided to punt for now, my impression was that the code is in good enough shape, and we need a bit of documentation polishes before I can mark it as "Will merge to 'next'". Possibly only checking the documenation aspects, so folks don't fall into the same trap as me.. ;-) Yup, so let's resolve that documentation thing while we remember that the topic has that issue, and what part of the documentation we find needs improvement. I am not sure what "trap: you fell into, though. Are you saying that giving git describe [...] git describe [...] in the synopsis is not helpful, because the user may not know what kind of object s/he has, and cannot decide from which set of options to pick? Then an alternative would be to list (If I remember correctly) My nit pick was roughly along the lines you suggest, and that the two option lists (for commit-ish and blob) were shown in different ways, which could lead to the scenarion that, with knowing the s/with/without/ ... oid object type (or knowing how to get it), the user could give an invalid option, and think the command failure was because the oid was invalid, not that the option was not appropriate, along with variations on that theme. The newer synopsis (v5) looks Ok in that it avoids digging the hole by not mentioning the blob options. Personally I'm more for manuals that tend toward instructional, rather than being expert references. I'd sneak in a line saying "The object type can be determined using `git cat-file`.", but maybe that's my work environment... git describe [...] in the synopsis, say upfront that most options are applicable only when describing a commit-ish, and when describing a blob, we do quite different thing and a separate set of options apply, perhaps? -- Philip
Re: [PATCH 6/7] builtin/describe.c: describe a blob
From: "Junio C Hamano" <gits...@pobox.com> : Friday, November 10, 2017 1:24 AM [catch up] "Philip Oakley" <philipoak...@iee.org> writes: From: "Stefan Beller" <sbel...@google.com> Rereading this discussion, there is currently no urgent thing to address? True. Then the state as announced by the last cooking email, to just cook it, seems about right and we'll wait for further feedback. A shiny new toy that is not a fix for a grave bug is rarely urgent, so with that criterion, we'd end up with hundreds of topics not in 'next' but in 'pu' waiting for the original contributor to get out of his or her procrastination, which certainly is not what I want to see, as I'd have to throw them into the Stalled bin and then eventually discard them, while having to worry about possible mismerges with remaining good topics caused by these topics appearing and disappearing from 'pu'. I'd rather see any topic that consumed reviewers' time to be polished enough to get into 'next' while we all recall the issues raised during previous reviews. I consider the process to further incrementally polish it after that happens a true "cooking". For this topic, aside from "known issues" that we decided to punt for now, my impression was that the code is in good enough shape, and we need a bit of documentation polishes before I can mark it as "Will merge to 'next'". Possibly only checking the documenation aspects, so folks don't fall into the same trap as me.. ;-) Yup, so let's resolve that documentation thing while we remember that the topic has that issue, and what part of the documentation we find needs improvement. I am not sure what "trap: you fell into, though. Are you saying that giving git describe [...] git describe [...] in the synopsis is not helpful, because the user may not know what kind of object s/he has, and cannot decide from which set of options to pick? Then an alternative would be to list (If I remember correctly) My nit pick was roughly along the lines you suggest, and that the two option lists (for commit-ish and blob) were shown in different ways, which could lead to the scenarion that, with knowing the oid object type (or knowing how to get it), the user could give an invalid option, and think the command failure was because the oid was invalid, not that the option was not appropriate, along with variations on that theme. The newer synopsis (v5) looks Ok in that it avoids digging the hole by not mentioning the blob options. Personally I'm more for manuals that tend toward instructional, rather than being expert references. I'd sneak in a line saying "The object type can be determined using `git cat-file`.", but maybe that's my work environment... git describe [...] in the synopsis, say upfront that most options are applicable only when describing a commit-ish, and when describing a blob, we do quite different thing and a separate set of options apply, perhaps? -- Philip
Re: [PATCHv5 7/7] builtin/describe.c: describe a blob
From: "Stefan Beller"Sent: Thursday, November 16, 2017 2:00 AM [in catch up mode..] Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) When describing commits, we try to anchor them to tags or refs, as these are conceptually on a higher level than the commit. And if there is no ref or tag that matches exactly, we're out of luck. So we employ a heuristic to make up a name for the commit. These names are ambiguous, there might be different tags or refs to anchor to, and there might be different path in the DAG to travel to arrive at the commit precisely. When describing a blob, we want to describe the blob from a higher layer as well, which is a tuple of (commit, deep/path) as the tree objects involved are rather uninteresting. The same blob can be referenced by multiple commits, so how we decide which commit to use? This patch implements a rather naive approach on this: As there are no back pointers from blobs to commits in which the blob occurs, we'll start walking from any tips available, listing the blobs in-order of the commit and once we found the blob, we'll take the first commit that listed the blob. For example git describe --tags v0.99:Makefile conversion-901-g7672db20c2:Makefile tells us the Makefile as it was in v0.99 was introduced in commit 7672db20. The walking is performed in reverse order to show the introduction of a blob rather than its last occurrence. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob Signed-off-by: Stefan Beller --- Documentation/git-describe.txt | 18 ++-- builtin/describe.c | 62 ++ t/t6120-describe.sh| 34 +++ 3 files changed, 107 insertions(+), 7 deletions(-) diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index c924c945ba..e027fb8c4b 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -3,14 +3,14 @@ git-describe(1) NAME -git-describe - Describe a commit using the most recent tag reachable from it - +git-describe - Give an object a human readable name based on an available ref SYNOPSIS [verse] 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...] 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=] +'git describe' DESCRIPTION --- @@ -24,6 +24,12 @@ By default (without --all or --tags) `git describe` only shows annotated tags. For more information about creating annotated tags see the -a and -s options to linkgit:git-tag[1]. +If the given object refers to a blob, it will be described +as `:`, such that the blob can be found +at `` in the ``, which itself describes the +first commit in which this blob occurs in a reverse revision walk +from HEAD. + OPTIONS --- ...:: @@ -186,6 +192,14 @@ selected and output. Here fewest commits different is defined as the number of commits which would be shown by `git log tag..input` will be the smallest number of commits possible. +BUGS + + +Tree objects as well as tag objects not pointing at commits, cannot be described. Is this true? Is it stand alone from the describing of a blob? If so should it be its own patchlet. - I thought I'd read that within the series there is now a tree / tag (of blob/trees) description capability. I'd prefer that we don't start with the "can't" view (relative to the subsequent sentences of the paragraph). It puts off the reader - we are about to say what can be described but in a limited way - the limitation being the bug. Maybe just swap the line to form a second paragraph. +When describing blobs, the lightweight tags pointing at blobs are ignored, +but the blob is still described as : despite the lightweight +tag being favorable. + -- Philip GIT --- Part of the linkgit:git[1] suite diff --git a/builtin/describe.c b/builtin/describe.c index 9e9a5ed5d4..5b4bfaba3f 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -3,6 +3,7 @@ #include "lockfile.h" #include "commit.h" #include "tag.h" +#include "blob.h" #include "refs.h" #include "builtin.h" #include "exec_cmd.h" @@ -11,8 +12,9 @@ #include "hashmap.h" #include "argv-array.h" #include "run-command.h" +#include "revision.h" +#include "list-objects.h" -#define SEEN (1u << 0) #define MAX_TAGS (FLAG_BITS - 1) static const char * const describe_usage[] = { @@ -434,6 +436,53 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) strbuf_addstr(dst, suffix); } +struct process_commit_data { + struct object_id current_commit; + struct object_id looking_for; + struct strbuf *dst; + struct rev_info *revs; +}; + +static void process_commit(struct commit *commit, void *data) +{ + struct process_commit_data *pcd = data; + pcd->current_commit = commit->object.oid; +} + +static void process_object(struct
Re: [PATCH v3 5/5] Testing: provide tests requiring them with ellipses after SHA-1 values
From: "Junio C Hamano"Ann T Ropea writes: *1* We are being overly generous in t4013-diff-various.sh because we do not want to destroy/take apart the here-document. Given that all this a temporary measure, we should get away with it. So, the need to reformat the test for the future post-deprecation period is being deferred to the time that the PRINT_SHA1_ELLIPSIS env variable, and all ellipis, is removed - is that the case? Maybe it just needs saying plainly. Or is the env variable being retained as a fallback 'forever'? I'm half guessing that it may tend toward the latter as it's an easier backward compatibility decision. [apologioes this is mid thread, I'm catching up on 2 weeks of emails] I do not think the patch is being particularly generous. If anything, it is being unnecessarily sloppy by not adding new checks to verify the updated behaviour. The above comment mentions "destroy/take apart" the here-document, but I do see no need to destroy anything. All you need to do is to enhance and extend. For example, you could do it like so (this is written in my e-mail client, and not an output of diff, so the indentation etc. may be all off, but should be sufficient to illustrate the idea): while read cmd do case "$cmd" in '' | '#'*) continue ;; esac test=$(echo "$cmd" | sed -e 's|[/ ][/ ]*|_|g') pfx=$(printf "%04d" $test_count) expect="$TEST_DIRECTORY/t4013/diff.$test" actual="$pfx-diff.$test" +case "$cmd" in +X*) cmd=${cmd#X}; no_ellipses=" (no ellipses)" ;; +*) no_ellipses= ;; +esac -test_expect_success "git $cmd" ' +test_expect_success "git $cmd$no_ellipses" ' { echo "\$ git $cmd" -git $cmd | +if test -n "$no_ellipses" +then +git $cmd +else +PRINT_SHA1_ELLIPSES=yes git $cmd +fi | sed -e ... done <<\EOF diff-tree initial diff-tree -r initial diff-tree -r --abbrev initial diff-tree -r --abbrev=4 initial +Xdiff-tree -r --abbrev=4 initial ... EOF There is a new and duplicated line with a prefix X for one existing test in the above. The idea is that the ones marked as such will test and verify the effect of this new behaviour by not setting the environment variable. The expected and actual test output for the new test will have X prefixed to it. t4013 is arranged in such a way that it is easy to add a new test like this---you only need to add an expected output in a new file in t/t4013/. directory. And the output with these ellipses removed will be something we would expect see in the new world (without the escape hatch environment variable), we would need to add a new file there to record what the expected output from the command is. I singled out the diff-tree invocation with --abbrev=4 as an example in the above, but in a more thorough final version, we'd need to cover both "abbreviation with ellipses" and "abbreviation without ellipses" output for other lines in the test case listed in the here-document.
Re: more pedantry ... what means a file "known to Git"?
From: "Robert P. J. Day"apologies for more excruciating nitpickery, but i ask since it seems that phrase means slightly different things depending on where you read it. first, i assume that there are only two categories: 1) files known to Git 2) files unknown to Git and that there is no fuzzy, grey area middle ground, yes? sort of... now, in "man git-clean", one reads (near the top): Cleans the working tree by recursively removing files that are not under version control, starting from the current directory. Normally, only files unknown to Git are removed, but if the -x ^ option is specified, ignored files are also removed. the way that's worded suggests that ignored files are "known" to Git, yes? You've hit the three way binary problem of +1, 0, -1 ! The lsb is still 0 or 1, but we have the two assertions of: Positively known to git -- added to the index and the object store Negatively 'known' to git -- paths we actively ignore, thus not in the index or object store. Unknown files are those that could be added. that is, if, by default, "git clean" removes only files "unknown" to Git, and "-x" extends that to ignored files, the conclusion is that ignored files are *known* to Git. but only in a negative sense ... if, however, you check out "man git-rm", you read: The list given to the command can be exact pathnames, file glob patterns, or leading directory names. The command removes only the paths that are known to Git. Giving the name of a file that you have not told Git about does not remove that file. so "git rm" removes only files "known to Git", but from the above regarding how "git clean" sees this, that should include ignored files, which of course it doesn't. The man page description starts with the key "Remove files from the index", so this is the positive 'knowing' part. Clearly it can never remove other ignored files as they can't be in the index (but note the 'other' caveat. P->Q # Q->P). given that this phrase occurs in a number of places: $ grep -ir "known to git" * builtin/difftool.c: /* The symlink is unknown to Git so read from the filesystem */ dir.c: error("pathspec '%s' did not match any file(s) known to git.", Documentation/git-rm.txt:removes only the paths that are known to Git. Giving the name of Documentation/git-commit.txt: be known to Git); Documentation/user-manual.txt:error: pathspec '261dfac35cb99d380eb966e102c1197139f7fa24' did not match any file(s) known to git. Documentation/gitattributes.txt: Notice all types of potential whitespace errors known to Git. Documentation/git-clean.txt:Normally, only files unknown to Git are removed, but if the `-x` Documentation/RelNotes/1.8.2.1.txt: * The code to keep track of what directory names are known to Git on Documentation/RelNotes/1.8.1.6.txt: * The code to keep track of what directory names are known to Git on Documentation/RelNotes/2.9.0.txt: known to Git. They have been taught to do the normalization. Documentation/RelNotes/2.8.4.txt: known to Git. They have been taught to do the normalization. Documentation/RelNotes/1.8.3.txt: * The code to keep track of what directory names are known to Git on t/t3005-ls-files-relative.sh: echo "error: pathspec $sq$f$sq did not match any file(s) known to git." t/t3005-ls-files-relative.sh: echo "error: pathspec $sq$f$sq did not match any file(s) known to git." $ it might be useful to define precisely what it means. or is it assumed to be context dependent? A little bit of clarification may be useful. You can't be/aren't the only one who is willing to note these subtle inconsistencies (Git knows things via the index (staging area) and the object store (repository)). rday -- Philip=
Re: [PATCH 00/30] Add directory rename detection to git
From: "Elijah Newren" <new...@gmail.com> : Friday, November 10, 2017 11:26 PM On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakley <philipoak...@iee.org> wrote: From: "Elijah Newren" <new...@gmail.com> In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. How does this cope with the case insensitive case preserving file systems on Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem that the series already covers, would the likely changes be reasonably localised? This came up recently on GfW for `git checkout` of a branch where the case changed ("Test" <-> "test"), but git didn't notice that it needed to rename the directories on such an file system. https://github.com/git-for-windows/git/issues/1333 I wasn't aware there were problems with git on case insensitive case preserving filesystems; fixing them wasn't something I had in mind when writing this series. I was mainly ensuring awareness of the potential issue, as it's not easy to solve. However, the particular bug you mention is actually completely orthogonal to this series; it talks about git-checkout without the -m/--merge option, which doesn't touch any code path I modified in my series, so my series can't really fix or worsen that particular issue. That's good. But, if there are further issues with such filesystems that also affect merges/cherry-picks/rebases, then I don't think my series will either help or hurt there either. The recursive merge machinery already has remove_file() and update_file() wrappers that it uses whenever it needs to remove/add/update a file in the working directory and/or index, and I have simply continued using those, so the number of places you'd need to modify to fix issues would remain just as localized as before. It's when the working directory path/filename has a case change that goes undetected (one way or another) that can cause issues. I think that part of the problem (after awareness) is not having a cannonical expectation of which way is 'right', and what options there may be. E,g. if a project is wholly on a case insensitive system then the filenames in the worktree never matter, but aligning the path/filenames in the repository would still be a problem. Also, I continue to depend on the reading of the index & trees that unpack_trees() does, which I haven't modified, so again it'd be the same number of places that someone would need to fix. (However, the whole design to have unpack_trees() do the initial work and then have recursive merge try to "fix it up" is really starting to strain. Interesting point. I'm starting to think, again, that merge recursive needs a redesign, and have some arguments I wanted to float out there...but I've dumped enough on the list for a day.) It's possible that this series fixes one particular issue -- namely when merging, if the merge-base contained a "Test" directory, one side added a file to that directory, and the other side renamed "Test" to "test", and if the presence of both "Test" and "test" directories in the merge result is problematic, then at least with my fixes you wouldn't end up with both directories and could thus avoid that problem in a narrow set of cases. I'll think on that. It may provide extra clues as to what the right solutions could be! Sorry that I don't have any better news than that for you. Elijah Thanks -- Philip
Re: [PATCH 00/30] Add directory rename detection to git
From: "Elijah Newren"[This series is entirely independent of my rename detection limits series. However, I have a separate rename detection performance series that depends on both this series and the rename detection limits series.] In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. How does this cope with the case insensitive case preserving file systems on Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem that the series already covers, would the likely changes be reasonably localised? This came up recently on GfW for `git checkout` of a branch where the case changed ("Test" <-> "test"), but git didn't notice that it needed to rename the directories on such an file system. https://github.com/git-for-windows/git/issues/1333 -- Philip
Re: [PATCH 6/7] builtin/describe.c: describe a blob
From: "Stefan Beller"Rereading this discussion, there is currently no urgent thing to address? True. Then the state as announced by the last cooking email, to just cook it, seems about right and we'll wait for further feedback. Possibly only checking the documenation aspects, so folks don't fall into the same trap as me.. ;-) -- Philip
Re: [PATCH 1/3] checkout: describe_detached_head: remove 3dots after committish
From: "Junio C Hamano" <gits...@pobox.com> Sent: Wednesday, November 08, 2017 1:59 AM "Philip Oakley" <philipoak...@iee.org> writes: But... ... This change causes quite a few tests to fall over; however, they all have truncated-something-longer-ellipses in their raw-diff-output expected sections, and removing the ellipses from there makes the tests pass again, :-) The number of failures you report in the test suit suggests that someone somewhere will be expecting that notation, and that we may need a deprecation period, perhaps with an 'ellipsis' config variable whose default value can later be flipped, though that leaves a config value needing support forever! Hmmm, never thought about that. I have been assuming that tools reading "--raw" output that is abbreviated would be crazy, because they have to strip the dots and the number of dots may not always be three [*1*]. But you are right. It would be very unlikely that there is no such crazy tools, so it deserves consideration if we would be breaking such tools. On the other hand, if such a crazy tool was still written correctly (it is debatable what the definition of "correct" is, though), it would be stripping any number dots at the end, not just insisting on seeing exactly three dots, and splitting these fields at SP. Otherwise they would already be broken as they cannot handle occasional object names that have less than three dots because they happen to be longer than the more common abbreviation length used by other objects. So in practice it might not be _too_ bad. Thinking on this, I'd suggest that the patch series does remove the ellipsis dots immediately, but retains a config option that can be set to get back the old 'dots' display for those who have badly written scripts that maybe haven't failed yet. i.e. no deprecation period, just a fall back option; and if nobody shouts then remove the config option after a respectable period. It would also mean the existing tests can be re-used... [Footnote] *1* When we ask for --abbrev=7, we allocate 10 places and fill the rest with necessary number of dots after the result of find_unique_abbrev(), so if an object name turns out to require 8 hexdigits to make it unique, we'll append only two dots to it to make it 10 so that it aligns nicely with others) and they would always be reading the full, non abbreviated output. The story does not change that much when we do not explicitly ask for a specific abbreviation length in that we add variable number of dots for aligning in that case, too. The --abbrev=7 does cater for many smaller repo's, so there is a possiblity that the bad script issue hasn't been hit yet by those repos. -- Philip
Re: [PATCH 1/3] checkout: describe_detached_head: remove 3dots after committish
From: "Ann T Ropea"Thanks for all the feedback provided! I'd like to summarise what consensus we have reached so far and then propose a way forward: * we'll use the term "ellipsis (pl. ellipses)" for what's been referred to as "3dots", "n-dots", "many dots" and so forth Using a consistent term for the *display* of shortened oid's is good. * we would like to use ellipses when attached to SHA-1 values only for the purpose of specifying a symmetric difference (as per gitrevisions(7)) The symetric difference (three-dots) is a specific Git *cli* notation that is distinct from the use of ellipsis for displaying oid's * the usage of ellipses as a "here we truncated something longer" is a relic which should be phased out. I think that is true. To get there, preventing describe_detached_head from appending an ellipsis to the SHA-1 values it prints is one important step. This change does not cause any test to fall over. But... The other important step is dealing with the "git diff --raw" output which features ellipses in the relic-fashion no longer desired. It would appear that simplifying diff.c's diff_aligned_abbrev routine to something like: /* Do we want all 40 hex characters? */ if (len == GIT_SHA1_HEXSZ) return oid_to_hex(oid); /* An abbreviated value is fine. */ return diff_abbrev_oid(oid, len); does do the trick. This change causes quite a few tests to fall over; however, they all have truncated-something-longer-ellipses in their raw-diff-output expected sections, and removing the ellipses from there makes the tests pass again, :-) The number of failures you report in the test suit suggests that someone somewhere will be expecting that notation, and that we may need a deprecation period, perhaps with an 'ellipsis' config variable whose default value can later be flipped, though that leaves a config value needing support forever! Junio should be able to better advise on his preferred approach. If we can agree that this is a way forward, i'll create & send v2 of the patch series to the mailing list (it'll include the fixed tests) and we'll see where we go from there. -- Philip
Re: [PATCH 1/3] checkout: describe_detached_head: remove 3dots after committish
From: "Junio C Hamano"Ann T Ropea writes: This could be confusing not only for novices; in either case, no range should be insinuated by describe_detached_head. We actually do not insinuate any range in these output. These dots denote "truncated at the end, instead of giving full length." Another place these "many dots" appear is "git diff --raw", for example. The fancy word for the three dots is an `ellipsis` - the omission from speech or writing of a word or words that are superfluous or able to be understood from contextual clues. - from the Ancient Greek: ἔλλειψις, élleipsis, "omission" or "falling short". The user/reader confusion may still be there though.
Re: [PATCH 6/7] builtin/describe.c: describe a blob
From: "Junio C Hamano" <gits...@pobox.com> Sent: Sunday, November 05, 2017 6:28 AM "Philip Oakley" <philipoak...@iee.org> writes: Is this not also an alternative case, relative to the user, for the scenario where the user has an oid/sha1 value but does not know what it is, and would like to find its source and type relative to the `describe` command. I am not sure what you wanted to say with "source and type RELATIVE TO the describe command". The 'relative to' was meaning the user's expectation about this particular command. For a non-expert user, who may not have come across cat-file yet, their world view may not extend beyond 'Git describe ' for me. The first thing the combination of the user and the describe command would do when the user has a 40-hex string would be to do the equivalent of "cat-file -t" to learn if it even exists and what its type is. With Stefan's patch, that is what describe command does in order to choose quite a different codeflow from the traditional mode when it learns that it was given a blob. I realised, after sending, that this was probably the method for non-ambiguous shortened oid's. Thanks for the reminder. IIUC the existing `describe` command only accepts values, and here we are extending that to be even more inclusive, but at the same time the options become more restricted. Do you mean that the command should check if it was given an option that would not be applicable to the "find a commit that has the blob" mode, once it learns that it was given a blob and needs to go in that codepath? I think that would make sense. Correct, it was the option selection aspect. Or have I misunderstood how the fast commit search and the slower potentially-a-blob searching are disambiguated? I do not think so. We used to barf when we got anything but commit-ish, but Stefan's new code kicks in if the object turns out to be a blob---I think that is what you mean by the disambiguation. Correct. We ask to describe an object, but then the option choices may vary by type. The new [blob] synopys only lists , while the old [commit-ish] shows specifics. It wasn't clear if the options are the same for both. I quess they are the same once the cat-file -t has done its bit. Its only the speed that's affected. As a side note, the commit message example don't show any pathspec that is not in the top level directory. -- Philip
Re: [PATCH 6/7] builtin/describe.c: describe a blob
From: "Junio C Hamano"Sent: Thursday, November 02, 2017 4:23 AM Junio C Hamano writes: The reason why we say "-ish" is "Yes we know v2.15.0 is *NOT* a commit object, we very well know it is a tag object, but because we allow it to be used in a context that calls for a commit object, we mark that use context as 'this accepts commit-ish, not just commit'". Having said all that, there is a valid case in which we might want to say "blob-ish". Is this not also an alternative case, relative to the user, for the scenario where the user has an oid/sha1 value but does not know what it is, and would like to find its source and type relative to the `describe` command. IIUC the existing `describe` command only accepts values, and here we are extending that to be even more inclusive, but at the same time the options become more restricted. Thus the synopsis terminology would be more about suggesting the range of options available (search style/start points) that are applicable to blobs, than being exactly about the 'allow-blobs' parameter. Or have I misunderstood how the fast commit search and the slower potentially-a-blob searching are disambiguated? -- Philip To review, X-ish is the word we use when the command wants to take an X, but tolerates a lazy user who gives a Y, which is *NOT* X, without bothering to add ^{X} suffix, i.e. Y^{X}. In such a case, the command takes not just X but takes X-ish because it takes a Y and converts it internally to an X to be extra nice. When the command wants to take a blob, but tolerates something else and does "^{blob}" internally, we can say it takes "blob-ish". Technically that "something else" could be an annotated tag that points at a blob object, without any intervening commit or tree (I did not check if the "describe " code in this thread handles this, though). But because it is not usually done to tag a blob directly, it would probably be not worth to say "blob-ish" in the document and cause readers to wonder in what situation something that is not a blob can be treated as if it were a blob. It does feel like we would be pursuing technical correctness too much and sacrificing the readability of the document, at least to me, and a bad trade-off.