On Wed, Aug 24, 2022 at 2:26 AM Rob Landley <[email protected]> wrote: > > On 8/22/22 10:39, enh via Toybox wrote: > > On Sun, Aug 21, 2022 at 9:12 AM Ray Gardner <[email protected] > > <mailto:[email protected]>> wrote: > > > > I guess I'll skip writing up an explanation of these algos then; I > > thought you were looking for a less "mathematical" explanation ("why > > can nobody explain what the old stuff is doing without pretending it's > > math instead of an algorithm?" "Doug McIlroy's old diff paper from > > 1976 is still written in math-ese. It SEEMS to be describing very > > simple concepts but it's trying to explain them as if this is > > calculus, which it is not." etc.) But that's moot if you're going with > > what appears to be the old "look ahead a little for matching lines to > > re-sync on" heuristic diff method. > > > > (fwiw, i think such a document would be a useful thing to leave lying > > around on > > the internet ... someone will want it sooner or later :-) ) > > I'm also interested. I've just been a bit distracted because my air > conditioner > broke over the weekend (soonest somebody can come fix it is thursday. It's > August in Texas...) and because my diff code isn't quite ready to go back and > run this new test through yet, and I wanted to include that in my reply.
Thanks for the encouragement, guys. Check it out. I've posted it at https://www.raygard.net/2022/08/26/diff-LCS-Hunt-Szymanski-Kuo-Cross/ along with the code. C code is https://github.com/raygard/lcs_diff_demo/blob/main/src/c/diff.c ; there is also Python code in that repo, including a version of Hunt-McIlroy that may satisfy Rob's complaint "Alas, I've read that paper through twice and have not managed to turn their explanation into pseudo-code, and that's WITH an implementation in front of me", if you find that Python is acceptable as executable pseudocode, as I've heard it said. The code is written very closely to the paper's appendix. But, if you read the post linked above, you'll see that it's probably better to use Kuo/Cross (or my mod of that) instead. I did find a bug in the diff.c I posted earlier and I think this version gets everything right. I expanded it a bit to include Hunt/Szymanski but that's just for demo, it's not as good as the Kuo/Cross implementations also included. (Compile it with HS, KC, or KCMOD defined to get the different algos.) I do this for my own enjoyment but also hoping someone will benefit from it. I wrote the earlier C code I posted to the list in the hope that it would help Rob. He's decided to go his own way, so that was probably many wasted hours. But if you want to get, e.g. a unified diff with -U 0 (no context lines), I have it. I think that may be harder with Rob's approach. I will also bet that my diff.c will be faster on large-ish files and produce better diffs but that remains to be tested. And I've also back-tested it by patch-ing the files with the diff output, including context lines of 0, 1, 2, 3, 4. So, nobody reads my blog ("glob") at least not yet. I spent many many hours developing the world's fastest qsort() implementations, but I have no way to get anyone who can get them used to look at it. Now I've written what I hope is a good explanation of a couple of LCS algos. I sure hope you'll read it and give me feedback. If you don't, I may never speak to you again :-). Ray Ray _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
