Launchpad has imported 19 comments from the remote bug at http://bugs.gentoo.org/show_bug.cgi?id=96376.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2005-06-17T06:39:14+00:00 World-root wrote: I've had a few RTF documents to text, and I noticed that unrtf outputs an exclamation mark instead of accents. Here's a patch that makes it produce valid UTF-8 text for any ANSI RTF input file. Please test :-) Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/0 ------------------------------------------------------------------------ On 2005-06-17T06:40:38+00:00 World-root wrote: Created attachment 61385 Patch to output ANSI RTF characters correctly Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/1 ------------------------------------------------------------------------ On 2005-06-17T06:42:22+00:00 World-root wrote: Created attachment 61386 Patch for the ebuild Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/2 ------------------------------------------------------------------------ On 2005-06-20T13:25:38+00:00 Tove wrote: Robin, do you want to take this bug? Joël, did you sent the patch to the upstream developers? Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/3 ------------------------------------------------------------------------ On 2005-06-20T13:25:38+00:00 Tove wrote: Robin, do you want to take this bug? Jo Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/4 ------------------------------------------------------------------------ On 2005-06-20T14:54:12+00:00 World-root wrote: No, not yet. Should I send it ? (I suppose unrtf was written before a common encoding, UTF-8 was created. So now that many people use UTF-8, I guess it's nice to put the extended characters to good use) Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/5 ------------------------------------------------------------------------ On 2005-06-20T15:26:31+00:00 Tove wrote: Let's wait for robbat2's comment. He's travelling for the next 2 weeks. Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/6 ------------------------------------------------------------------------ On 2005-07-02T14:05:04+00:00 Robin H. Johnson wrote: please send this to upstream. if they are unresponsive, then i'll just patch our ebuild, but i'd prefer it if they took it first. Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/7 ------------------------------------------------------------------------ On 2005-07-03T02:31:49+00:00 World-root wrote: Robin, Thanks for your response ! I'm trying to do it. Two remarks though: - I've just found a newer version: http://ftp.gnu.org/gnu/unrtf/0.19.7/ - [email protected] does not work - there is a patch (text_french.patch) in the 0.19.7 package, which is similar to mine, but only handles a few accents. I'll try to contact its author. I'll let you know when I get something ! Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/8 ------------------------------------------------------------------------ On 2006-01-11T02:09:35+00:00 Gentoo-bugger wrote: Any news on this? I'm just trying the 3rd party kat ebuilds and they contain an ebuild with this patch. Would be cool if I needed one ebuild less in my overlay :) Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/9 ------------------------------------------------------------------------ On 2006-01-11T02:14:24+00:00 Gentoo-bugger wrote: I just saw that there's a new version 0.19.9 from last week, from the changelog: | 0.19.4: added unicode support | 0.19.5: removed defective PS support and non-free text files | more unicode support | improved symbol font support - no longer puts entities in latex output | Bug#266020 concerning double slashes fixed | Bug#269054 concerning Doctype fixed | Bug#287038 security breach fixed | (thanks to Joey Hess <[email protected]>) | 0.19.6: fix some latex problems | 0.19.7: updated FSF address | 0.19.8: minor fixes | 0.19.9: included verbose mode So it might be fixed in that version... Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/10 ------------------------------------------------------------------------ On 2006-01-11T02:33:28+00:00 World-root wrote: Hi, Actually (before I made the patch) the authors did put an _unused_ "text_french.patch" file in unrtf 0.19.7 -- but their patch is incomplete (see comment #7). I sent an email containing the information, as well as a link to this bugzilla page, to the upstream developers on 3rd July 2005: TO: [email protected], [email protected] CC: [email protected] I got no response so far. I haven't looked (or tried) unrtf 0.19.9 -- could you have a quick look at the test.c file, to see what characters they added in the tables ? Best Regards Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/11 ------------------------------------------------------------------------ On 2006-01-11T03:03:28+00:00 Gentoo-bugger wrote: unrtf has a project page at savannah, here [1]. There's both a bug and a patch tracker, maybe you've got more luck there. [1] http://savannah.gnu.org/projects/unrtf/ It seems like they added a few but not all characters, and different to your solution: mss@otherland ~/tmp $ diff -u unrtf-0.19.3/text.c unrtf_0.19.9/text.c --- unrtf-0.19.3/text.c 2004-02-19 00:35:04.000000000 +0100 +++ unrtf_0.19.9/text.c 2006-01-06 22:56:06.000000000 +0100 @@ -1,7 +1,6 @@ - /*============================================================================= GNU UnRTF, a command-line program to convert RTF documents to other formats. - Copyright (C) 2000,2001 Zachary Thayer Smith + Copyright (C) 2000,2001,2004 by Zachary Smith This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -15,20 +14,25 @@ You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software - Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - The author is reachable by electronic mail at [email protected]. + The maintainer is reachable by electronic mail at [email protected] =============================================================================*/ /*---------------------------------------------------------------------- * Module name: text - * Author name: Zach Smith + * Author name: Zachary Smith * Create date: 19 Sep 01 * Purpose: Plain text output module *---------------------------------------------------------------------- * Changes: * 22 Sep 01, [email protected]: added function-level comment blocks + * 29 Mar 05, [email protected]: changes requested by ZT Smith + * 14 Jun 05, [email protected]: higher Iso-Latin-1 characters + * added - thanks to [email protected] and + * [email protected] + * 23 Jul 05, [email protected]: added endash, emdash and bullet *--------------------------------------------------------------------*/ @@ -59,22 +63,24 @@ static char* upper_translation_table [128] = { - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", +/* 0 1 2 3 4 5 6 7 */ +/* 80 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* 88 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* 90 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* 98 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* A0 */ " Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/12 ------------------------------------------------------------------------ On 2006-01-11T03:03:28+00:00 Gentoo-bugger wrote: unrtf has a project page at savannah, here [1]. There's both a bug and a patch tracker, maybe you've got more luck there. [1] http://savannah.gnu.org/projects/unrtf/ It seems like they added a few but not all characters, and different to your solution: mss@otherland ~/tmp $ diff -u unrtf-0.19.3/text.c unrtf_0.19.9/text.c --- unrtf-0.19.3/text.c 2004-02-19 00:35:04.000000000 +0100 +++ unrtf_0.19.9/text.c 2006-01-06 22:56:06.000000000 +0100 @@ -1,7 +1,6 @@ - /*============================================================================= GNU UnRTF, a command-line program to convert RTF documents to other formats. - Copyright (C) 2000,2001 Zachary Thayer Smith + Copyright (C) 2000,2001,2004 by Zachary Smith This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -15,20 +14,25 @@ You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software - Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - The author is reachable by electronic mail at [email protected]. + The maintainer is reachable by electronic mail at [email protected] =============================================================================*/ /*---------------------------------------------------------------------- * Module name: text - * Author name: Zach Smith + * Author name: Zachary Smith * Create date: 19 Sep 01 * Purpose: Plain text output module *---------------------------------------------------------------------- * Changes: * 22 Sep 01, [email protected]: added function-level comment blocks + * 29 Mar 05, [email protected]: changes requested by ZT Smith + * 14 Jun 05, [email protected]: higher Iso-Latin-1 characters + * added - thanks to [email protected] and + * [email protected] + * 23 Jul 05, [email protected]: added endash, emdash and bullet *--------------------------------------------------------------------*/ @@ -59,22 +63,24 @@ static char* upper_translation_table [128] = { - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", - "?", "?", "?", "?", "?", "?", "?", "?", +/* 0 1 2 3 4 5 6 7 */ +/* 80 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* 88 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* 90 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* 98 */ "?", "?", "?", "?", "?", "?", "?", "?", +/* A0 */ " ", "¡", "¢", "£", "¤", "¥", "¦", "§", +/* A8 */ "¨", "©", "ª", "«", "¬", "", "®", "¯", +/* B0 */ "°", "±", "²", "³", "´", "µ", "¶", "·", +/* B8 */ "¸", "¹", "º", "»", "¼", "½", "¾", "¿", +/* C0 */ "À", "Á", "Â", "Ã", "Ä", "Å", "Æ", "Ç", +/* C8 */ "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "Ï", +/* D0 */ "Ð", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö", "×", +/* D8 */ "Ø", "Ù", "Ú", "Û", "Ü", "Ý", "Þ", "ß", +/* E0 */ "à", "á", "â", "ã", "ä", "å", "æ", "ç", +/* E8 */ "è", "é", "ê", "ë", "ì", "í", "î", "ï", +/* F0 */ "ð", "ñ", "ò", "ó", "ô", "õ", "ö", "÷", +/* F8 */ "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ", +/* 8 9 A B C D E F */ }; @@ -255,6 +261,11 @@ text_op->chars.left_quote = "`"; text_op->chars.right_dbl_quote = "''"; text_op->chars.left_dbl_quote = "``"; +#if 1 /* daved - 0.19.8 */ + text_op->chars.endash = ""; /* not ASCII */ + text_op->chars.emdash = "-"; + text_op->chars.bullet = "·"; /* not ASCII */ +#endif return text_op; } Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/13 ------------------------------------------------------------------------ On 2006-01-11T07:10:37+00:00 World-root wrote: Ah, this new patch looks good :-) It handles everything, excluding values 0x80..0x9F. It can be because that range of values is forbidden/reserved and cannot not be found in ANSI RTF anyway (I have no idea what's the deal with these 0x80..0x9F values). My only concern: filling the array in a C file with characters (instead of hex value) could be a bit dangerous, depending on the compiler's character set support (?) Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/14 ------------------------------------------------------------------------ On 2006-02-16T17:38:41+00:00 Robin H. Johnson wrote: I've just commit 0.19.9 to the tree, is the patch from this bug still needed? Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/15 ------------------------------------------------------------------------ On 2006-02-17T04:30:35+00:00 World-root wrote: I've just tried the 0.19.9 version. Indeed, the patch I posted is not needed anymore, *but* please note that unrtf will always output ISO-8859-1 text, regardless of the user's $LANG setting. Not very good for pure UTF-8 users IMHO. Ideal workaround: unrtf should iconv() the whole text at runtime, so the input obeys the user's preferred encoding. In the meantime, I suggest adding this as a first line in src_compile(): src_compile() { iconv -f ISO-8859-15 text.c >text.c.new && mv text.c.new text.c This would detect the user's encoding at emerge time, which is better than ignoring it completely. With this line added, unrtf outputs proper UTF-8 text for me. Since iconv is called without '-t' (target encoding) argument, it *should* convert to the user's preferred encoding. It works for UTF-8 -- can someone please test with an ISO-8859 $LANG/$LC_ALL ? I have userlocales and only UTF-8 locales built. Thanks Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/16 ------------------------------------------------------------------------ On 2006-02-20T01:06:24+00:00 Robin H. Johnson wrote: I don't agree with using iconv like that. My root user runs in a different $LANG than my regular user. unrtf really must be made encoding-aware. I'm going to close this for now, and I'd ask you take it to upstream again. If you diff the old release with the new one, you'll see there is a new maintainer, and hopefully he can be more responsive. Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/17 ------------------------------------------------------------------------ On 2006-02-20T08:13:41+00:00 World-root wrote: He's from Australia, right ? Ok, e-mail is sent (including of course, a link to this page) :-) When something happens I'll report it here. Reply at: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/comments/18 ** Changed in: gentoo Status: Unknown => Fix Released ** Changed in: gentoo Importance: Unknown => Wishlist -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/290503 Title: Unrtf does not handle UTF-8 correctly. The version is rather old To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/unrtf/+bug/290503/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
