** Description changed:

  In the RTF format, the \u command is used to specify Unicode characters,
  and the value of the \uc command specifies how many characters to ignore
  after each \u command.  The RTF specification says that the default
  value of \uc should be 1.
  
  Some RTF files explicitly set \uc1, even though this is not required as
  1 is already the default.  This bug activates on RTF files that set \uc1
  explicitly.
  
  In rtfread.c line 265 the loop says while((--i)>0) which means it will
  iterate i-1 times, where i has just been set to groups[group_count].uc.
  Therefore, if the uc value is 1, this loop will iterate zero times and
  no characters will be ignored after the \u command, whereas actually one
  fallback character should have been ignored.  This usually manifests
  itself as affected RTF files showing a question mark after every non-
  ASCII character.
  
  Line 205 of the same file sets groups[0].uc = 2 with the comment Default
  uc = 2.  This comment is incorrect according to the RTF specification,
  which says that the default value of uc is 1.  However, setting it to 2
  does work around the fact that the loop starting on line 265 iterates
  only i-1 times instead of i times.
  
  Therefore, if the RTF file does not contain any \uc commands, the
  default value of uc is 1 and the code behaves correctly, because it sets
  uc to 2 and then effectively subtracts 1 from this by way of using pre-
  decrement instead of post-decrement in the loop test.  But if \uc1 is
- set explicity by the RTF file, then line 253 comes into play, which says
- groups[group_count].uc=com.numarg setting the uc variable to 1 instead
- of 2, and then the loop on line 265 iterates zero times and the fallback
- character gets included.
+ set explicitly by the RTF file, then line 253 comes into play, which
+ says groups[group_count].uc=com.numarg setting the uc variable to 1
+ instead of 2, and then the loop on line 265 iterates zero times and the
+ fallback character gets included.
  
  The quickest way to fix this bug would be to add a +1 before the
  semicolon at the end of line 253, but I think the code would be clearer
  if the two instances of the number 2 on line 205 could be changed to 1
  and then change line 265 such that instead of saying while((--i)>0) it
  says while((i--)>0), so that the uc variable contains the actual value
  of the \uc command and the loop iterates the correct number of times.
  
  Meanwhile, the bug can be worked around in most cases by deleting any
  instance of \uc1 in the input RTF before feeding it to catdoc.  That
  workaround applies only for RTF files that never use any \uc value other
  than 1.  For such RTF files, it is sufficient to use the command:
  
  sed -e 's/\uc1//g' < file.rtf | catdoc
  
  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: catdoc 1:0.95-5
  ProcVersionSignature: User Name 6.5.0-1027.27~22.04.1-oracle 6.5.13
  Uname: Linux 6.5.0-1027-oracle x86_64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: oracle
  CloudName: oracle
  CloudPlatform: oracle
  CloudSubPlatform: metadata (http://169.254.169.254/opc/v2/)
  Date: Wed Aug  7 14:36:50 2024
  ProcEnviron:
-  TERM=xterm-256color
-  PATH=(custom, no user)
-  XDG_RUNTIME_DIR=<set>
-  LANG=en_GB.UTF8
-  SHELL=/bin/bash
+  TERM=xterm-256color
+  PATH=(custom, no user)
+  XDG_RUNTIME_DIR=<set>
+  LANG=en_GB.UTF8
+  SHELL=/bin/bash
  SourcePackage: catdoc
  UpgradeStatus: No upgrade log present (probably fresh install)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076244

Title:
  RTF files containing \uc1 show fallback characters after Unicode

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/catdoc/+bug/2076244/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to