Bug#307647: tex4ht: unicode used when it is not needed

2005-05-13 Thread Ran Gilad-Bachrach
Dear Professor Gurari,

  thank you very much. It works like charm. 

Rani

On 5/8/05, Eitan Gurari [EMAIL PROTECTED] wrote:
 
 
 I modified the bugfixes distribution to provide reduced usage of
 unicode values in iso-8859-1 output. The requests are to be made
 through commands similar to
 
htlatex file  iso8859/1/charset/less/!
 
 or by modifying the charset paths in tex4ht.env accordingly.
 Currently the only cases addressed are the ligatures 'ff' and 'fi' and
 a few non-ligature values.  Additional cases will be addressed in
 response to bug reports.
 
 -eitan
 
   tex4ht makes use of unicode letter when this is not needed. This happens 
 when
   the latex code contains the sequence ff or fi and maybe other 
 sequences. For
   example, here is a latex code and the html code generated by ht4tex and 
 htlatex. Note how
   the sequence fi was translated to #xFB01;




Bug#307647: tex4ht: unicode used when it is not needed

2005-05-05 Thread Ran Gilad-Bachrach
Dear Kapil,

  My main goal in using tex4ht is to share documents with people who
do not use TeX, or process the documents by other programs. For this
purpose, the problem I have reported on is important as it prevents
such use. However, for the sake of publishing a document in html
format, this is of no major concern. Thus, I accept your opinion that
this should be counted in the wish list.

Thank you for the great assistance, 

   Rani

On 5/5/05, Kapil Hari Paranjape [EMAIL PROTECTED] wrote:
 Dear Ran Gilad-Bachrach,
 
 Please see the enclosed mail from the author Eitan Gurari.
 He is planning to provide a fix in the next version. For
 the time being I think I will agree with Vassilii that this is
 really wishlist rather than important (at least as a bug for
 tex4ht---I do think it is up to text viewers/browsers that
 render unicode to do this job as correctly as possible).
 
 On Wed, May 04, 2005 at 03:04:48PM -0400, Eitan Gurari wrote:
  Unfortunately, too many people complain about this and other similar
  lack of font support problems by browsers for unicode symbols.  I'll
  try to `fix' the problem the coming weekend.  -eitan
 
 Perhaps the fix will take the form of an option for mk4ht/htlatex that
 selects non-unicode glyph substitution.
 
 I hope I have your permission. I am re-tagging this as a wishlist item.
 
 Thanks and regards,
 
 Kapil.
 --
 




Bug#307647: tex4ht: unicode used when it is not needed

2005-05-04 Thread Ran Gilad-Bachrach

Package: tex4ht
Version: 20050402.1817-1
Severity: important

tex4ht makes use of unicode letter when this is not needed. This happens when
the latex code contains the sequence ff or fi and maybe other sequences. For
example, here is a latex code and the html code generated by ht4tex and 
htlatex. Note how
the sequence fi was translated to #xFB01;


--- newfile1.tex -

%% LyX 1.3 created this file.  For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass[english]{article}
\usepackage[latin1]{inputenc}

\makeatletter
\usepackage{babel}
\makeatother
\begin{document}
efficient classifier
\end{document}

--- newfile1.html (tex4ht) ---

efficient classi#xFB01;er


--- newfile1.html (htlatex) --

!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
  http://www.w3.org/TR/html4/loose.dtd;
html 
headtitle/title
meta http-equiv=Content-Type content=text/html; charset=iso-8859-1
meta name=generator content=TeX4ht 
(http://www.cse.ohio-state.edu/~gurari/TeX4ht/mn.html)
meta name=originator content=TeX4ht 
(http://www.cse.ohio-state.edu/~gurari/TeX4ht/mn.html)
!-- html --
meta name=src content=newfile1.tex
meta name=date content=2005-04-21 09:30:00
link rel=stylesheet type=text/css href=newfile1.css
/headbody

!--l. 10--p class=noindentefficient classi#xFB01;er
/body/html




-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.8
Locale: LANG=he_IL, LC_CTYPE=he_IL (charmap=ISO-8859-8)

Versions of packages tex4ht depends on:
ii  libc6   2.3.2.ds1-20 GNU C Library: Shared libraries an
ii  libkpathsea32.0.2-28 path search library for teTeX (run
ii  tetex-bin   2.0.2-28 The teTeX binary files

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#307647: tex4ht: unicode used when it is not needed

2005-05-04 Thread Ran Gilad-Bachrach
Dear Kapil,

  Thank you for the prompt answer. I am not an expert in type-setting
however I have noticed two things which makes the conversion tex4ht
does problematic. First, when you open the html file using a browser,
the ff,fi,... combination look different than the rest of the text
(blurred). Second, the funny conversion makes it hard to apply
post-processors, such as spell checkers and syntax checkers to the
html file.
  you are probably right that by creating an htf file this can be
done. However, I would expect that this would be the default behavior,
hence I do not think that the user should be bothered with doing that.
Nevertheless, I might be wrong ...

  thank you once again,

   Rani

On 5/4/05, Kapil Hari Paranjape [EMAIL PROTECTED] wrote:
 Dear Ran Gilad-Bachrach,
 
 Thanks for your report.
 
 On Wed, May 04, 2005 at 03:58:36PM +0300, Ran Gilad-Bachrach wrote:
  tex4ht makes use of unicode letter when this is not needed. This happens 
  when
  the latex code contains the sequence ff or fi and maybe other 
  sequences. For
  example, here is a latex code and the html code generated by ht4tex and 
  htlatex. Note how
  the sequence fi was translated to #xFB01;
 
 Could you please tell me why you think this is a bug? Please keep the
 following in mind.
 
 1. TeX4HT tries as much as possible to be *like* TeX except that it
outputs hypertext.
 
 2. TeX uses ligatures whenever it encounters ff, fi, fl and so on.
 
 3. It *is* possible for you to define an alternate mechanism to avoid
ligatures---create your own htf files which skip the ligatures.
 
 Thanks and best regards,
 
 Kapil.
 --