Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 1359 by Diff fails for text file with  
non-ascii characters

What version are you running?

Review board 1.0 on Python 2.4

What's the URL of the page containing the problem?


What steps will reproduce the problem?
1. Create a text file containing the byte 0xED (in Windows, Alt+0237)
    This character corresponds to the "latin small letter i with acute" in
the Windows Western encoding.
2. Check this into source control
3. Edit the file to replace this character with a lowercase 'i'
4. Post the change to review board
5. Attempt to view the diff

(This actually happened to me today -- I was trying to fix a source file
that had this non-ascii, non utf-8 character in a docstring)

What is the expected output? What do you see instead?

I would expect to see a diff, possibly with invalid characters replaced by
hexadecimal representations.

Instead I get the following traceback:

'ascii' codec can't decode byte 0xc3 in position 26: ordinal not in  

Traceback (most recent call last):
line 152, in view_diff
     interdiffset, highlighting, True)
line 623, in get_diff_files
line 143, in cache_memoize
     data = lookup_callable()
line 622, in <lambda>
line 434, in get_chunks
     a[i1:i2], b[j1:j2], oldlines, newlines)
line 268, in diff_line
     if oldline and newline and oldline != newline:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 26:
ordinal not in range(128)

What operating system are you using? What browser?

Firefox 3.5.3 on Windows XP SP3

Please provide any additional information below.

