Bug#307647: tex4ht: unicode used when it is not needed
Dear Professor Gurari, thank you very much. It works like charm. Rani On 5/8/05, Eitan Gurari [EMAIL PROTECTED] wrote: I modified the bugfixes distribution to provide reduced usage of unicode values in iso-8859-1 output. The requests are to be made through commands similar to htlatex file iso8859/1/charset/less/! or by modifying the charset paths in tex4ht.env accordingly. Currently the only cases addressed are the ligatures 'ff' and 'fi' and a few non-ligature values. Additional cases will be addressed in response to bug reports. -eitan tex4ht makes use of unicode letter when this is not needed. This happens when the latex code contains the sequence ff or fi and maybe other sequences. For example, here is a latex code and the html code generated by ht4tex and htlatex. Note how the sequence fi was translated to #xFB01;
Bug#307647: tex4ht: unicode used when it is not needed
Dear Kapil, My main goal in using tex4ht is to share documents with people who do not use TeX, or process the documents by other programs. For this purpose, the problem I have reported on is important as it prevents such use. However, for the sake of publishing a document in html format, this is of no major concern. Thus, I accept your opinion that this should be counted in the wish list. Thank you for the great assistance, Rani On 5/5/05, Kapil Hari Paranjape [EMAIL PROTECTED] wrote: Dear Ran Gilad-Bachrach, Please see the enclosed mail from the author Eitan Gurari. He is planning to provide a fix in the next version. For the time being I think I will agree with Vassilii that this is really wishlist rather than important (at least as a bug for tex4ht---I do think it is up to text viewers/browsers that render unicode to do this job as correctly as possible). On Wed, May 04, 2005 at 03:04:48PM -0400, Eitan Gurari wrote: Unfortunately, too many people complain about this and other similar lack of font support problems by browsers for unicode symbols. I'll try to `fix' the problem the coming weekend. -eitan Perhaps the fix will take the form of an option for mk4ht/htlatex that selects non-unicode glyph substitution. I hope I have your permission. I am re-tagging this as a wishlist item. Thanks and regards, Kapil. --
Bug#307647: tex4ht: unicode used when it is not needed
Package: tex4ht Version: 20050402.1817-1 Severity: important tex4ht makes use of unicode letter when this is not needed. This happens when the latex code contains the sequence ff or fi and maybe other sequences. For example, here is a latex code and the html code generated by ht4tex and htlatex. Note how the sequence fi was translated to #xFB01; --- newfile1.tex - %% LyX 1.3 created this file. For more info, see http://www.lyx.org/. %% Do not edit unless you really know what you are doing. \documentclass[english]{article} \usepackage[latin1]{inputenc} \makeatletter \usepackage{babel} \makeatother \begin{document} efficient classifier \end{document} --- newfile1.html (tex4ht) --- efficient classi#xFB01;er --- newfile1.html (htlatex) -- !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; html headtitle/title meta http-equiv=Content-Type content=text/html; charset=iso-8859-1 meta name=generator content=TeX4ht (http://www.cse.ohio-state.edu/~gurari/TeX4ht/mn.html) meta name=originator content=TeX4ht (http://www.cse.ohio-state.edu/~gurari/TeX4ht/mn.html) !-- html -- meta name=src content=newfile1.tex meta name=date content=2005-04-21 09:30:00 link rel=stylesheet type=text/css href=newfile1.css /headbody !--l. 10--p class=noindentefficient classi#xFB01;er /body/html -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (500, 'testing') Architecture: i386 (i686) Kernel: Linux 2.6.8 Locale: LANG=he_IL, LC_CTYPE=he_IL (charmap=ISO-8859-8) Versions of packages tex4ht depends on: ii libc6 2.3.2.ds1-20 GNU C Library: Shared libraries an ii libkpathsea32.0.2-28 path search library for teTeX (run ii tetex-bin 2.0.2-28 The teTeX binary files -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#307647: tex4ht: unicode used when it is not needed
Dear Kapil, Thank you for the prompt answer. I am not an expert in type-setting however I have noticed two things which makes the conversion tex4ht does problematic. First, when you open the html file using a browser, the ff,fi,... combination look different than the rest of the text (blurred). Second, the funny conversion makes it hard to apply post-processors, such as spell checkers and syntax checkers to the html file. you are probably right that by creating an htf file this can be done. However, I would expect that this would be the default behavior, hence I do not think that the user should be bothered with doing that. Nevertheless, I might be wrong ... thank you once again, Rani On 5/4/05, Kapil Hari Paranjape [EMAIL PROTECTED] wrote: Dear Ran Gilad-Bachrach, Thanks for your report. On Wed, May 04, 2005 at 03:58:36PM +0300, Ran Gilad-Bachrach wrote: tex4ht makes use of unicode letter when this is not needed. This happens when the latex code contains the sequence ff or fi and maybe other sequences. For example, here is a latex code and the html code generated by ht4tex and htlatex. Note how the sequence fi was translated to #xFB01; Could you please tell me why you think this is a bug? Please keep the following in mind. 1. TeX4HT tries as much as possible to be *like* TeX except that it outputs hypertext. 2. TeX uses ligatures whenever it encounters ff, fi, fl and so on. 3. It *is* possible for you to define an alternate mechanism to avoid ligatures---create your own htf files which skip the ligatures. Thanks and best regards, Kapil. --