(i actually thought the question was more about the workflow, in which case an answer would look more like the "Exporting a patch"/"Importing patches" sections of https://git.wiki.kernel.org/index.php/QuickStart ...)
On Sun, Jan 13, 2019 at 9:30 PM Rob Landley <[email protected]> wrote: > > On 1/13/19 3:57 PM, scsijon wrote: > > Any chance of a two or three page "Introduction to Creating and > > Understanding > > Patches for Dummies" for those of us who either don't know how to build > > one, or > > like me, have, "but don't really know what i'm doing". > > > > When you can make time of course, i'd really like to understand more of > > what the > > group is doing with patches submitted rather than only a little. > > > > Please, with pure honey on crumpets. > Patches are reasonably straightforward, if somewhat reverse engineered > historically. > > Back in the 1980's somebody invented diff -u ("unified diff format") as a more > human readable alternative o the <old >new lines format you get without the > -u, > and then Larry Wall whipped up a program to reverse the process and use saved > diff -u output to modify a file (which was mind-blowing at the time). As far > as > I can tell the format wasn't really meant for that, and was made to work with > heuristics and hitting it with a rock, but Larry _did_ go on to invent Perl... > > A patch is a series of "hunks", describing a range of lines in the "old" > version > and the corresponding range in the "new" version. Patches have 6 different > types > of lines, each starting with one of "+++ ", "--- ", "@@ ", " ", "+", or "-". > > The first 2 (the --- and +++ lines) are control lines that come at the start > and > indicate we're working on a new file. They indicate the old file name and the > new file name for the changed files. If you "diff -u oldfile newfile" you get > a > hunk starting with: > > --- oldfile > +++ newfile > @@ -oldstart,oldlines +newstart,newlines @@ comment > and so on > > Those first two lines are --- or +++, one space, and the filename. > > Unfortunately, the original unified diff format then followed each filename > with > a tab character and the timestamp of the file (in yyyy-mm-dd hh:mm:ss tzoff" > format), which means if you have a tab character in the filename you can't > patch > them. These days this datestamp is optional, and most patches don't have them > anymore. (I have a todo item to make toybox patch work backwards from the end > of > the line and peel off only a properly formatted tab+date entry and leave it > alone otherwise, but right now it just stops at the first tab. Which is not a > space or newline, and thus almost never occurs in filenames and nobody's > complained yet (because if you tab in the windows gui it switches focus so > windows people can't trivially create this breakage and then wine for us to > "support" it)... Still, lemme do a quick commit to make that suck _slightly_ > less by at least requiring the next character to be a digit in order to match > the date and strip it off. It still doesn't handle filenames with a newline in > them, but... how would you?) > > If this (now optional) date was the unix epoch (midnight, january 1, 1970, > which > timezone adjustements often moved to December 31, 1969), it indicated we were > comparing against a nonexistent file. The more modern way to say this is to > use > the special filename /dev/null. So if you want patch to create a file, what > you > do is "diff -u /dev/null newfile", and if you want it to delete a file, "diff > -u > oldfile /dev/null". (Otherwise it leaves a zero length file when you remove > allthe lines, or expects an empty file to already be there when adding with no > context lines.) > > The other fun thing is when you diff 2 files, the files need to have different > names. How do you know which one you're applying the patch to? Historically, > it > tried both names and used whichever one worked... but if you happen to have a > file with your tempname lying around in the directory you're applying the > patch > _to_ (which happens a lot when you habitually use the same tempfile name), the > hunk may try to apply to the wrong file. (There were certain horrible > heuristics > I don't remember that tried to work out what you _meant_ to do, which didn't > really help and I don't think I implemented them?) > > And these days files have paths. As the switch from CVS to SVN (let alone git) > taught us: individual standalone files aren't very interesting, you're almost > always operating on a _tree_ of files. > > So generally what you do _now_ (and what tools like svn or mercurial or git > pretend to do behind the scenes) is back up one directory, have two full trees > (the vanilla project and your modified version), and "diff -ruN" the two > subdirectories: -r is recursive, -u is unified format instead of the old < > and > > version, and -N says pretend to compare new or removed files against /dev/null > so the diff says to add or remove them properly. That's why tools like svn or > mercurial or git will create diffs that start like: > > +++ a/path/to/file > --- b/path/to/file > > Except... now you've got an extra level of directory you don't want, so you > have > to back up _out_ of your project's tree to apply the patch and it's STILL > guessing which name you mean. > > So what you do is create the diffs like that, then use the "-p 1" option when > applying them, which says "peel off one layer of directory when parsing the > filenames". That removes the a/ and b/ from the paths, and the rest should be > identical so it's no longer ambiguous and it doesn't matter if you use the +++ > or the --- line as the file to apply the patch to. (No, -p1 doesn't apply to > the > magic name /dev/null, absolute paths aren't modified, only relative ones. > Also, > you can say "-p0" to disable the above "certain horrible heuristics" on > pathless > filenames and just literally use the filenames in the patch, but that doesn't > come up much these days. Creating a diff between two trees and applying it > within the top level of the tree via "patch -p1" is nearly universal now. > That's > the format "git format-patch -1 $HASH" and "git am file.patch" are using, for > example.) > > Ok, so all that's indicating what file hunks apply to, then you get to actual > hunks describing what changes to make within the file. Each hunk starts with > an > @@ line, with 4 numbers, like so: > > @@ -start,len +start,len @@ comment > > Each "start" is the (decimal) line number in that file the hunk starts > applying > at, and the "len" is the (decimal) number of lines described in that file. > These > numbers measure the body of the hunk, which comes next. > > (The "comment" part can be anything, and doesn't even have to be there. It's > ignored. Modern language-aware diff -u variants stick which C function you're > modifying in there, which is nice for humans but not used by patch that I know > of. This simple crappy heuristic there is "last unindented line", which can > find > goto labels: ...) > > Each line of the rest of the body of that hunk starts with one of three > characters: > > 1) + meaning this line is only in the new version (it was added). > 2) - meaning this line is only in the old version (it was removed). > 3) " " (space) = this line is the same in both (it's context for the changes). > > The context lines plus + lines need to add up to the "len" in the + part of > the > @@ line, and the context lines plus - lines need to add up to the len in the - > part. (The start is more or less a comment, used to indicate how far off it > applies at if the hunk moved but otherwise not rally mattering as far as I can > tell. Well toybox doesn't use it.) > > Note: if your code is tab indented, it still needs a space (ascii 32) at the > start of it to be a context line, then it's binary identical for the contents > (so tabs or spaces as appropriate). This causes some editors to flip out about > mixing tabs and spaces, but the distinction is functional here. > > Patch opens files when it sees +++ --- line pairs, reads in the next @@ hunk > and > the appropriate number of lines after it (with the right number of context > lines, additions, and removals for what the @@ line counts said), and then > searches in the file for a place where the appropriate context lines and > removed > lines appear in the right order (removed lines are matched just like context, > if > they're not there in the file the hunk doesn't apply), then replaces it with > the > set of context lines and added lines the hunk says should go there instead. > (Note that if you patch -r then it's the + lines being removed and the - lines > being added, "reversing" the patch.) > > Each hunk generally starts with 3 leading context lines, and end with 3 > trailing > context lines, which generally provides enough context to uniquely identify > where to apply the hunk even if you're just adding a single line (that's the > pathological case of providing no other corroborating information). The > exception is when you're hunk applies at the start or end of the file: then > there aren't enough context lines, and may not be _any_ if you're right at the > end or beginning of the file. > > The hunk also has interstitial context lines as appropriate (between the > additions and removals, which also have to match or the hunk won't apply), but > not more than 6 (leading + trailing context line count) or it'd split into 2 > hunks. (This _does_ mean you can have 4 context lines in a row though.) > > What IS important is that you have the same number of leading context lines as > trailing context lines, unless you're at the start/end of a file. If they > don't, > it's not a valid hunk and patch barfs on the corrupted patch. And the number > of > leading/trailing context lines not being the same means the patch program will > try to MATCH the start/end of the file (whichever one's got truncated > context), > and fail if it can't (hunk does not apply, context is wrong). > > You can have as many hunks as you want within a file, I.E as many @@ lines > after > a given --- +++ pair, but the hunks must apply in order, and this INCLUDES > the > context lines. A line that's been "seen" as a trailing context line won't > match > against the leading context of the next hunk. > > Because of this, you sometimes need 3 or more interstitial context lines in a > row in the _middle_ of a hunk (between + and - lines), if that's how your > changes work out. A number of consecutive context lines matching the leading > context does NOT end the hunk, only consumig the line counts from the @@ line > does that. And then you figure out if leading/trailing context counts match > (indicating the need to match start/end of file) _after_ that. (If you really > want to back up and modify an earlier part of the file, you need a new --- +++ > pair to flush and reopen the file, so it can start over searching at the > beginning.) > > Oh, I know I said the start numbers in the @ line were only used for warnings, > but you CAN use them to sanity check the leading context number if you want > to. > (Since if you're forcing a match with the beginning of the hunk, it had better > start at 0 in that file or something is wrong.) Doesn't help with end of file > though. > > So you wind up with: > > --- filename > +++ filename > @@ -start,len +start,len @@ > context > context > context > -blah > +blah > context > context > context > @@ -start,len +start,len @@ > ... > > Oh, the - lines usually come before the + lines when they're on the same line, > but I don't think that's actually required? The entire context is matched > before > applying the hunk anyway. And note that you don't skip what you've already > looked at when a hunk didn't apply, you go down ONE line and try matching > again. > If your context lines are all blank, you can skip the start of where this hunk > applies otherwise. I hit and fixed that bug years ago in toybox. :) > > And of course all this is before git added a "rename" syntax that looks like: > > https://lwn.net/Articles/244448/ > > And has copy and delete variants that allow it to be much less verbose (avoids > including the body of the matched file(s)). > > It's on the todo list... :) > > Rob > > P.S. You asked. > > > ps and i'm looking forward to the next mkroot, I miss Aborigonal! > > Alas, I just landed back in Milwaukee to do another round of $DAYJOB because > neither toybox nor mkroot pay the bills. (I'm very grateful to the > https://patreon.com/landley subscribers, and it's great encouragement, but my > mortgage alone is like 25 times what that brings in. Nobody with a significant > budget wants to fund this work, and keeping the lights on gets scheduled > higher > than things that don't. But I can presumably cut a mkroot release with the > 4.20 > kernel right after I do a toybox release at the end of the month. All 4.20 > broke > that I've noticed so far was adding sha256 as a hard requirement to the s390x > build, and I can add that to the toybox airlock install passthroughs for the > moment...) > > (I had a huge todo list for my month off... and wound up going limp for most > of > it. I was doing ok until the battery in this old laptop completely died (as in > unplug = instant off, so suspend is useless and I lose all open windows every > time I move it. And alas I did NOT get Devuan working on the new System76 > laptop > I ordered a few months back (binary wifi firmware tantrum in the installer), > and > what they preinstalled on it has systemd, and given a choice between "system > with no battery" and "system with systemd" it's no contest. But I did get the > new She-Ra and Hilda watched, and the first season of The Good Place, so > that's > something...) > > Still Rob > _______________________________________________ > Toybox mailing list > [email protected] > http://lists.landley.net/listinfo.cgi/toybox-landley.net _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
