Yeah, the diff viewer in Review Board is certainly larger than other differs
such as difflib, but that's because it isn't just another difflib. It has:

* Database storage of the diffs it operates on
* A complete implementation of the Myers diff algorithm + optimizations.
Python's difflib uses a much simpler, inferior algorithm that doesn't even
have a concept of "replace" lines.
* Full side-by-side diffs that can show either the entire file with the diff
applied or just the changed segments.
* Syntax highlighting
* Interdiffs (diffs between revisions of a diff)
* Move detection (showing whether or not a range of lines have moved from
one location in a file to another)
* Inter-line diffs (What parts of a line changed in a "replace" line)
* The ability to "watch" for certain lines in one of the files in the diff
and mark the location and string for later processing (such as finding where
classes/functions are and representing them).
* Intelligent whitespace handling and markup, so that you know whether a
particular line only contains whitespace and can thus be hidden if needed.

Python's difflib is something that you would use for generating a unified
diff, not something that parses an existing one. Truth be told, you don't
want to go the route of writing code that applies a diff to a buffer/file.
You want to use patch for this. The reason is that, while they seem simple,
diffs come in all shapes and sizes. I'm not talking unified vs. context, but
rather subtleties in how various implementations even generate a unified
diff, what they include in there, etc. If you ever look at the source for
GNU patch, you'll run away.

The method Review Board uses is to take the unified diff, parse it just
enough to split it into one diff per file, and then actually use GNU patch.
We fetch a file from the repository, save it in a buffer as the original
file (left-hand side of the diff), and to a temp file, patch the temp file
with the diff, and then read that back in as the modified file (right-hand
side of the diff). You can then use either the Myers differ in Review Board
(recommended -- and standalone) or Python's difflib (won't give you the
results you'd normally expect from a differ in many cases).

Now, what I've said has assumed that you're going to have the original file
to display, instead of just the diff. If it's just the diff, then the Review
Board diff viewer won't help you. Nor will Python's difflib. What you'll
need to do is custom-parse the diff. You'll have to handle the headers
indicating the files, and the lines. For the lines, you'll need to keep
records on each line and whether it's unmodified, insert, delete, or
replace. For unmodified, insert and delete, you can check if the line starts
with " ", "+" or "-". For replace, you're sort of out of luck. You can try
some heuristics, but it'll get it wrong part of the time. Review Board used
to do this before our Myers differ (as Python's difflib doesn't generate
"replace" information), and it was just bad.

If you do go the route of parsing the diffs and need to parse it into files,
you can look at reviewboard/diffviewer/parser.py and the various DiffParser
subclasses in reviewboard/scmtools/*.py. It won't help you get the
insert/delete/remove lines, though, as that's something we get by patching
the file and then running the Myers differ across both.


Christian Hammond - chip...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com

On Sat, Jan 9, 2010 at 2:06 PM, Anirudh Sanjeev

> Hi!
> Excerpts from Christian Hammond's message of Sun Jan 10 03:18:17 +0530
> 2010:
> > There are no plans to take the diff viewer and fully separate it out as
> an
> > independent, reusable component, as it's a lot of work and increases our
> > maintenance burden.
> I see your rationale behind this. However my needs are quite simple - I
> have a patch in the unified diff format which needs to be converted into
> the side-by-side HTML view which shouldn't be too hard to implement from
> scratch.
> I was hoping someone could merely point me at where to look at the
> relatively large ReviewBoard codebase which I am using only as
> reference.
> I looked in the diffviewer app, but that seems to be rather large and
> daunting. Most other implementations seem to be quite inferior to
> ReviewBoard's implementation of the same. Python's difflib's HTML table
> generator is enough for my needs, but unfortunately, I couldn't find a
> good enough way to convert unified diff to the difflib object structure.
> Again, any pointers on where to start looking at the code would be
> awesome!
> Thank you,
> Anirudh
> --
> Senior Undergraduate student, Indian Institute of Technology, Kharagpur
> http://anirudhsanjeev.org
Want to help the Review Board project? Donate today at 
Happy user? Let us know at http://www.reviewboard.org/users/
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to