Bugs item #1937966, was opened at 2008-04-08 11:43
Message generated for change (Settings changed) made by sborho
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=968354&aid=1937966&group_id=199155

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: dialogs
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Wagner Bruna (wagnerbruna)
Assigned to: Nobody/Anonymous (nobody)
Summary: Changelog: utf8 messages handled incorrectly

Initial Comment:
TortoiseHg version: 0.4 RC1
hg version: 0.9.5
system: Debian Etch

Commit messages with utf8 characters are handled incorrectly by the changeset 
browser. The commit message is reported correctly on the changeset viewer, but 
it is wrong on the changeset list. The diff itself is also incorrect.

See screenshot and repository bundle attached. The repository was made like 
this:

/tmp$ hg init test
/tmp$ cd test
/tmp/test$ echo 'áéíóú' > test.txt
/tmp/test$ hg ci -Am 'áéíóú'
adding test.txt
/tmp/test$ hg log -p
changeset:   0:7b5953603fc0
tag:         tip
user:        [email protected]
date:        Tue Apr 08 13:09:50 2008 -0300
summary:     áéíóú

diff -r 000000000000 -r 7b5953603fc0 test.txt
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/test.txt  Tue Apr 08 13:09:50 2008 -0300
@@ -0,0 +1,1 @@
+áéíóú

/tmp/test$ echo $LANG
pt_BR.UTF-8

I'm running TortoiseHg from sources, via hgtk script.

Thanks,
Wagner


----------------------------------------------------------------------

Comment By: TK Soh (tksoh)
Date: 2008-04-12 05:02

Message:
Logged In: YES 
user_id=411637
Originator: NO

Before I say anything more, I'd like to declare that this whole
encoding/locale topic is really getting over my head now.

The patches look fine to me, which certainly improve the handling of
string encoding. However, with you patches, the diff chunk is displayed as
"áéíóú". Though I am not certain this should be considered
wrong, and it's most probably the result of the default locale (?) setting
my system.

I am getting a feeling we are trying to hard to get the right encoding for
the diff blocks. Mercurial only encode and decode the metadata, while the
files are checked in 'as-is'. In theory, a user and checking files of any
encoding, so there's probably not way to get the encoding right on the diff
blocks. As long as we can successfully convert them to pygtk's required
encoding of utf-8, we should be fine.

So, unless anything bad comes up from now to 0.4 release, I'm accepting
the patches. Thanks.

----------------------------------------------------------------------

Comment By: Wagner Bruna (wagnerbruna)
Date: 2008-04-11 19:38

Message:
Logged In: YES 
user_id=2057939
Originator: YES

Attaching a patch (also against 0.4RC1) for the diff display.

It falls back to mercurial.util._fallbackencoding too, which isn't a good
solution (Mercurial uses it only for metadata decoding), but at least is
user configurable.

A better solution would be displaying it in non-strict mode, and allowing
an override via context or toolbar menu, but that's a bit too much for my
pygtk skills...
File Added: platform_encoding_on_diff.patch

----------------------------------------------------------------------

Comment By: Wagner Bruna (wagnerbruna)
Date: 2008-04-11 15:02

Message:
Logged In: YES 
user_id=2057939
Originator: YES

The previous patch has a flaw displaying non-utf8 file diffs. 

I'm uploading a new patch fixing only the commit message displays; it also
deals with non-utf8 commit messages (that could appear on old
repositories). I'll send a fix for the diff display afterwards.

File Added: dont_override_encoding_commitmsg.patch

----------------------------------------------------------------------

Comment By: Wagner Bruna (wagnerbruna)
Date: 2008-04-11 12:24

Message:
Logged In: YES 
user_id=2057939
Originator: YES

Same diff uploaded as a patch.

Thanks a lot,
Wagner

File Added: dont_override_encoding.patch

----------------------------------------------------------------------

Comment By: TK Soh (tksoh)
Date: 2008-04-10 20:20

Message:
Logged In: YES 
user_id=411637
Originator: NO

Can you upload you patch?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-04-10 14:36

Message:
Logged In: NO 

Interesting.

Well, please take a look at the following diff. I simply removed all
locale overrides, on hggtk/changeset.py and hggtk/vis/treemodel.py, and on
my system it fixed both the summary and the diff displays.

Thinking about it, the summary encoding will always depend on the locale /
encoding at the changeset author's machine. So, if an override is needed,
maybe a configuration option would be better (Mercurial itself uses the
environment variable HGENCODING for that).

====

diff -r 5a5341bda4c5 hggtk/changeset.py
--- a/hggtk/changeset.py        Mon Apr 07 12:13:14 2008 -0500
+++ b/hggtk/changeset.py        Thu Apr 10 15:48:37 2008 -0300
@@ -124,7 +124,7 @@ class ChangeSet(GDialog):
         for p in parents:
             pctx = self.repo.changectx(p)
             summary = pctx.description().splitlines()[0]
-            summary = unicode(summary, 'latin-1', 'replace')
+            summary = unicode(summary)
             change = str(p) + ':' + short(self.repo.changelog.node(p))
             title = 'parent:'
             title += ' ' * (12 - len(title))
@@ -135,7 +135,7 @@ class ChangeSet(GDialog):
         for n in self.repo.changelog.children(ctx.node()):
             cctx = self.repo.changectx(n)
             summary = cctx.description().splitlines()[0]
-            summary = unicode(summary, 'latin-1', 'replace')
+            summary = unicode(summary)
             childrev = self.repo.changelog.rev(n)
             change = str(childrev) + ':' + short(n)
             title = 'child:'
@@ -176,7 +176,7 @@ class ChangeSet(GDialog):
         except StopIteration:
             return False

-        lines = unicode(txt, 'latin-1', 'replace').splitlines()
+        lines = unicode(txt).splitlines()
         eob = buf.get_end_iter()
         offset = eob.get_offset()
         fileoffs, tags, lines, statmax = self.prepare_diff(lines, offset,
file)
diff -r 5a5341bda4c5 hggtk/vis/treemodel.py
--- a/hggtk/vis/treemodel.py    Mon Apr 07 12:13:14 2008 -0500
+++ b/hggtk/vis/treemodel.py    Thu Apr 10 15:48:37 2008 -0300
@@ -85,7 +85,7 @@ class TreeModel(gtk.GenericTreeModel):
             ctx = self.repo.changectx(revid)

             summary = ctx.description().replace('\0', '')
-            summary = unicode(summary.split('\n')[0], 'latin-1',
'replace')
+            summary = unicode(summary.split('\n')[0])
             summary = gobject.markup_escape_text(summary)
             node = self.repo.lookup(revid)
             tags = ', '.join(self.repo.nodetags(node))


----------------------------------------------------------------------

Comment By: TK Soh (tksoh)
Date: 2008-04-09 20:33

Message:
Logged In: YES 
user_id=411637
Originator: NO

On my XP box (see attached snapshot, tksoh_utf8_log.png), both the summary
text and description field of the diff window are displayed correctly as
"áéíóú". Only the content of test.txt in the diff chunks are displayed
wrongly, which require fixing.

I wonder what's causing the discrepancy between your system and mine.
File Added: tksoh_utf8_log.PNG

----------------------------------------------------------------------

Comment By: Wagner Bruna (wagnerbruna)
Date: 2008-04-09 13:44

Message:
Logged In: YES 
user_id=2057939
Originator: YES

FYI, on my system this bug is actually a regression.

The summary column is displayed correctly on TortoiseHg revision
20d428b65e03, but wrongly on 65a7ba2dffc3 (this revision fixed bug
1914550).

The file diff displays correctly at 3d24edb9ab2d, and wrongly at
737c58ecc790 (also an encoding fix). 


----------------------------------------------------------------------

Comment By: Wagner Bruna (wagnerbruna)
Date: 2008-04-09 12:25

Message:
Logged In: YES 
user_id=2057939
Originator: YES

The test.hg file is a Mercurial full bundle.

The diff viewer (lower right panel) displays the commit message correctly,
as "áéíóú". But the "summary" column on the changeset list (upper
panel) displays something like "áéíóú" (looks much like utf8
data read as iso-8859-1 characters). The same "áéíóú"
characters are shown on the test.txt diff itself.


----------------------------------------------------------------------

Comment By: . (qwelnor)
Date: 2008-04-09 03:13

Message:
Logged In: YES 
user_id=764770
Originator: NO

The same problem as described on top appears on windows xp sp2.

----------------------------------------------------------------------

Comment By: TK Soh (tksoh)
Date: 2008-04-08 21:42

Message:
Logged In: YES 
user_id=411637
Originator: NO

I am confused by "The commit message is reported correctly on the
changeset viewer, but it is wrong on the changeset list". Can you please
help elaborate?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-04-08 21:27

Message:
Logged In: NO 

What's the format of the attached repo.hg file? Bundle?

----------------------------------------------------------------------

Comment By: Wagner Bruna (wagnerbruna)
Date: 2008-04-08 11:44

Message:
Logged In: YES 
user_id=2057939
Originator: YES

File Added: test.hg

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=968354&aid=1937966&group_id=199155

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Tortoisehg-develop mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tortoisehg-develop

Reply via email to