Bugs item #1937966, was opened at 2008-04-08 11:43 Message generated for change (Settings changed) made by sborho You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=968354&aid=1937966&group_id=199155
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: dialogs Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Wagner Bruna (wagnerbruna) Assigned to: Nobody/Anonymous (nobody) Summary: Changelog: utf8 messages handled incorrectly Initial Comment: TortoiseHg version: 0.4 RC1 hg version: 0.9.5 system: Debian Etch Commit messages with utf8 characters are handled incorrectly by the changeset browser. The commit message is reported correctly on the changeset viewer, but it is wrong on the changeset list. The diff itself is also incorrect. See screenshot and repository bundle attached. The repository was made like this: /tmp$ hg init test /tmp$ cd test /tmp/test$ echo 'áéíóú' > test.txt /tmp/test$ hg ci -Am 'áéíóú' adding test.txt /tmp/test$ hg log -p changeset: 0:7b5953603fc0 tag: tip user: [email protected] date: Tue Apr 08 13:09:50 2008 -0300 summary: áéíóú diff -r 000000000000 -r 7b5953603fc0 test.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test.txt Tue Apr 08 13:09:50 2008 -0300 @@ -0,0 +1,1 @@ +áéíóú /tmp/test$ echo $LANG pt_BR.UTF-8 I'm running TortoiseHg from sources, via hgtk script. Thanks, Wagner ---------------------------------------------------------------------- Comment By: TK Soh (tksoh) Date: 2008-04-12 05:02 Message: Logged In: YES user_id=411637 Originator: NO Before I say anything more, I'd like to declare that this whole encoding/locale topic is really getting over my head now. The patches look fine to me, which certainly improve the handling of string encoding. However, with you patches, the diff chunk is displayed as "áéÃóú". Though I am not certain this should be considered wrong, and it's most probably the result of the default locale (?) setting my system. I am getting a feeling we are trying to hard to get the right encoding for the diff blocks. Mercurial only encode and decode the metadata, while the files are checked in 'as-is'. In theory, a user and checking files of any encoding, so there's probably not way to get the encoding right on the diff blocks. As long as we can successfully convert them to pygtk's required encoding of utf-8, we should be fine. So, unless anything bad comes up from now to 0.4 release, I'm accepting the patches. Thanks. ---------------------------------------------------------------------- Comment By: Wagner Bruna (wagnerbruna) Date: 2008-04-11 19:38 Message: Logged In: YES user_id=2057939 Originator: YES Attaching a patch (also against 0.4RC1) for the diff display. It falls back to mercurial.util._fallbackencoding too, which isn't a good solution (Mercurial uses it only for metadata decoding), but at least is user configurable. A better solution would be displaying it in non-strict mode, and allowing an override via context or toolbar menu, but that's a bit too much for my pygtk skills... File Added: platform_encoding_on_diff.patch ---------------------------------------------------------------------- Comment By: Wagner Bruna (wagnerbruna) Date: 2008-04-11 15:02 Message: Logged In: YES user_id=2057939 Originator: YES The previous patch has a flaw displaying non-utf8 file diffs. I'm uploading a new patch fixing only the commit message displays; it also deals with non-utf8 commit messages (that could appear on old repositories). I'll send a fix for the diff display afterwards. File Added: dont_override_encoding_commitmsg.patch ---------------------------------------------------------------------- Comment By: Wagner Bruna (wagnerbruna) Date: 2008-04-11 12:24 Message: Logged In: YES user_id=2057939 Originator: YES Same diff uploaded as a patch. Thanks a lot, Wagner File Added: dont_override_encoding.patch ---------------------------------------------------------------------- Comment By: TK Soh (tksoh) Date: 2008-04-10 20:20 Message: Logged In: YES user_id=411637 Originator: NO Can you upload you patch? ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-04-10 14:36 Message: Logged In: NO Interesting. Well, please take a look at the following diff. I simply removed all locale overrides, on hggtk/changeset.py and hggtk/vis/treemodel.py, and on my system it fixed both the summary and the diff displays. Thinking about it, the summary encoding will always depend on the locale / encoding at the changeset author's machine. So, if an override is needed, maybe a configuration option would be better (Mercurial itself uses the environment variable HGENCODING for that). ==== diff -r 5a5341bda4c5 hggtk/changeset.py --- a/hggtk/changeset.py Mon Apr 07 12:13:14 2008 -0500 +++ b/hggtk/changeset.py Thu Apr 10 15:48:37 2008 -0300 @@ -124,7 +124,7 @@ class ChangeSet(GDialog): for p in parents: pctx = self.repo.changectx(p) summary = pctx.description().splitlines()[0] - summary = unicode(summary, 'latin-1', 'replace') + summary = unicode(summary) change = str(p) + ':' + short(self.repo.changelog.node(p)) title = 'parent:' title += ' ' * (12 - len(title)) @@ -135,7 +135,7 @@ class ChangeSet(GDialog): for n in self.repo.changelog.children(ctx.node()): cctx = self.repo.changectx(n) summary = cctx.description().splitlines()[0] - summary = unicode(summary, 'latin-1', 'replace') + summary = unicode(summary) childrev = self.repo.changelog.rev(n) change = str(childrev) + ':' + short(n) title = 'child:' @@ -176,7 +176,7 @@ class ChangeSet(GDialog): except StopIteration: return False - lines = unicode(txt, 'latin-1', 'replace').splitlines() + lines = unicode(txt).splitlines() eob = buf.get_end_iter() offset = eob.get_offset() fileoffs, tags, lines, statmax = self.prepare_diff(lines, offset, file) diff -r 5a5341bda4c5 hggtk/vis/treemodel.py --- a/hggtk/vis/treemodel.py Mon Apr 07 12:13:14 2008 -0500 +++ b/hggtk/vis/treemodel.py Thu Apr 10 15:48:37 2008 -0300 @@ -85,7 +85,7 @@ class TreeModel(gtk.GenericTreeModel): ctx = self.repo.changectx(revid) summary = ctx.description().replace('\0', '') - summary = unicode(summary.split('\n')[0], 'latin-1', 'replace') + summary = unicode(summary.split('\n')[0]) summary = gobject.markup_escape_text(summary) node = self.repo.lookup(revid) tags = ', '.join(self.repo.nodetags(node)) ---------------------------------------------------------------------- Comment By: TK Soh (tksoh) Date: 2008-04-09 20:33 Message: Logged In: YES user_id=411637 Originator: NO On my XP box (see attached snapshot, tksoh_utf8_log.png), both the summary text and description field of the diff window are displayed correctly as "áéíóú". Only the content of test.txt in the diff chunks are displayed wrongly, which require fixing. I wonder what's causing the discrepancy between your system and mine. File Added: tksoh_utf8_log.PNG ---------------------------------------------------------------------- Comment By: Wagner Bruna (wagnerbruna) Date: 2008-04-09 13:44 Message: Logged In: YES user_id=2057939 Originator: YES FYI, on my system this bug is actually a regression. The summary column is displayed correctly on TortoiseHg revision 20d428b65e03, but wrongly on 65a7ba2dffc3 (this revision fixed bug 1914550). The file diff displays correctly at 3d24edb9ab2d, and wrongly at 737c58ecc790 (also an encoding fix). ---------------------------------------------------------------------- Comment By: Wagner Bruna (wagnerbruna) Date: 2008-04-09 12:25 Message: Logged In: YES user_id=2057939 Originator: YES The test.hg file is a Mercurial full bundle. The diff viewer (lower right panel) displays the commit message correctly, as "áéíóú". But the "summary" column on the changeset list (upper panel) displays something like "áéÃóú" (looks much like utf8 data read as iso-8859-1 characters). The same "áéÃóú" characters are shown on the test.txt diff itself. ---------------------------------------------------------------------- Comment By: . (qwelnor) Date: 2008-04-09 03:13 Message: Logged In: YES user_id=764770 Originator: NO The same problem as described on top appears on windows xp sp2. ---------------------------------------------------------------------- Comment By: TK Soh (tksoh) Date: 2008-04-08 21:42 Message: Logged In: YES user_id=411637 Originator: NO I am confused by "The commit message is reported correctly on the changeset viewer, but it is wrong on the changeset list". Can you please help elaborate? ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-04-08 21:27 Message: Logged In: NO What's the format of the attached repo.hg file? Bundle? ---------------------------------------------------------------------- Comment By: Wagner Bruna (wagnerbruna) Date: 2008-04-08 11:44 Message: Logged In: YES user_id=2057939 Originator: YES File Added: test.hg ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=968354&aid=1937966&group_id=199155 ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Tortoisehg-develop mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tortoisehg-develop
