Re: [l2h] Converting emdashs and endashs?

2003-08-14 Thread James Howison
On Monday, August 11, 2003, at 11:08  pm, Ross Moore wrote:

Hello James,

On Mon, 11 Aug 2003, James Howison wrote:

Now I have curly quotes happening (yay!) I am wondering about the  
other
special characters.  I realize that this will break back-wards
compatibility but that is not an issue for my needs.

I would like --- to be converted to #8212; as defined in the
unicode.pl file at 799 - but this doesn't seem to happen - instead it
is converted to --.  This is also what happens if I change --- to
{---}.
That is definitely a lot harder; particularly since -- and --- are
rarely used correctly in LaTeX manuscripts.
So general rules may easily result in something that the author
never intended.
I use -- and --- often.

I'm still wondering, though, how to tell which conversions specified in  
the unicode.pl file actually happen and which do not---and how those  
are controlled ... I guess I'll spend some more time with the source ;)

Also I see from the source that converting single quotes is
tough---perhaps I'm naive but it would seem to me that this sequence
would work...
s/``/#8220;/og
s/`/#8216;/og # once the `` is gone then the ` is only used for
open single quote right?
Not at all.  \`  is used as an accent, and in some language variants,
the ` is made active to remove the need to use the \ .
With this active character, overloading can occur for generating
other special characters or ligatures.
Right - well I see the difficulty now.  Quite an important distinction  
- language compatibility being very important.  The use of ` rather  
than \ is not something that I'm familiar with - out of interest why is  
this done - is it because the \ character is not easily accessible on  
the keyboard?

Perhaps if these conversions are done _after_ the conversions from  
latex-unicode then perhaps this would work (i.e. the international  
characters would already be converted to their unicode expressions ...).

s/''/#8221;/og
s/'/#8217;/og # Will also replace apostrophes with close curly
single - not a bad thing.
Sorry; I cannot agree.
Every Latin-based charset encoding has an apostrophe character.
A curly-quote is most definitely *not* logically an apostrophe, even
though it may look like one.
I acknowledge that this is a matter of style---but the unicode standard  
discusses this and generally prefers the use of the curly single  
(#2019) to the straight mark (#0027)

http://www.unicode.org/unicode/reports/tr8/ 
#Apostrophe%20Semantics%20Errata

snip

The aim of an HTML translation should not be appearance.
It should be ensuring that meaning is preserved, and that no symbol
is rendered with the 'missing character' glyph.
I think one might reasonably disagree that appearance is not  
important---HTML is, intentions notwithstanding, a format used for  
presentation.  Your point and care is about the 'missing character'  
glyph is well taken, the warnings are very useful for this.

The 'div' request for CSS in Hakan's email also reflects the use of  
HTML as an appearance format.

Thanks,
James
Hope this helps,

	Ross Moore

Thanks,
James
On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:

On Sat, 9 Aug 2003, James Howison wrote:

Hi all,

I'd really like to convert the latex quotation marks, `` and '' to  
the
recommended HTML curly quotes, #8220 instead of `` and #8221  
instead
of '' - standard codes that render the curly quotes beautifully.
set
 $USE_CURLY_QUOTES =1;
in an initialisation file.
This is not the default, because not all browsers actually render
these characters. (At least, that was the situation 3-4 years ago  
when
the LaTex2HTML coding was written.)

Hope this helps,

	Ross Moore


I'm sure that this is possible through latex2html - the codes are
listed around unicode.pl:722 - but either I can't find the magic
incantation to have latex2html do the conversion or there is a bug
preventing this from working in my version (1.70) or set-up.
I've tried:

latex2html -html_version 4.0,unicode test.tex

What is strange is that this does work for, say \v{Z} which converts
to
the code #381; (and that is definitely happening through unicode.pl
(I
changed the translation and it worked fine).
So why doesn't the translation for `` (which is correctly listed in
the
unicode.pl as \`\`) and '' which is correctly listed as \'\' work?
I've had a good hunt around for this - but I can't see why the other
codes are converted but not the quotes.
Cheers,
James
ps.  minimal test.tex follows

--

\documentclass[11pt]{article}
\begin{document}
``Why are these quotes not converted to unicode''  (they are in the
unicode.pl file)
While this symbol (also in the unicode.pl file) is? - \v{Z}
\end{document}
___
latex2html mailing list
[EMAIL PROTECTED]
http://tug.org/mailman/listinfo/latex2html

___
latex2html mailing list
[EMAIL PROTECTED]
http://tug.org/mailman/listinfo/latex2html


[Fwd: Re: [l2h] Converting emdashs and endashs?]

2003-08-14 Thread Daniel Taupin


 Original Message 
Subject: Re: [l2h] Converting emdashs and endashs?
Date: Tue, 12 Aug 2003 19:35:11 +0200
From: Daniel Taupin [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: James Howison [EMAIL PROTECTED]
References: [EMAIL PROTECTED]
Please, do not confuse shapes of quotes (single, double) which are a character
problem, with the handling of -- and ---. The last things are standard ligatures
with TeX fonts, while the first ones are a question of typing taste.
Therefore, since it is a TeX/LaTeX standard, I ask for a standard conversion
(unless in math mode) from -- to endash and a FURTHER conversion of  endash
followed by a - to emdash.
On the other hand, I would disagree with a change in the behaviour of double
quotes, mainly because iot would be tricky for people performing copy/paste from
latex2html generated screens.
James Howison wrote:

On Monday, August 11, 2003, at 11:08  pm, Ross Moore wrote:

Hello James,

On Mon, 11 Aug 2003, James Howison wrote:

Now I have curly quotes happening (yay!) I am wondering about the  other
special characters.  I realize that this will break back-wards
compatibility but that is not an issue for my needs.
I would like --- to be converted to #8212; as defined in the
unicode.pl file at 799 - but this doesn't seem to happen - instead it
is converted to --.  This is also what happens if I change --- to
{---}.


That is definitely a lot harder; particularly since -- and --- are
rarely used correctly in LaTeX manuscripts.
So general rules may easily result in something that the author
never intended.


I use -- and --- often.

I'm still wondering, though, how to tell which conversions specified in  
the unicode.pl file actually happen and which do not---and how those  
are controlled ... I guess I'll spend some more time with the source ;)

Also I see from the source that converting single quotes is
tough---perhaps I'm naive but it would seem to me that this sequence
would work...
s/``/#8220;/og
s/`/#8216;/og # once the `` is gone then the ` is only used for
open single quote right?


Not at all.  \`  is used as an accent, and in some language variants,
the ` is made active to remove the need to use the \ .
With this active character, overloading can occur for generating
other special characters or ligatures.


Right - well I see the difficulty now.  Quite an important distinction  
- language compatibility being very important.  The use of ` rather  
than \ is not something that I'm familiar with - out of interest why is  
this done - is it because the \ character is not easily accessible on  
the keyboard?

Perhaps if these conversions are done _after_ the conversions from  
latex-unicode then perhaps this would work (i.e. the international  
characters would already be converted to their unicode expressions ...).

s/''/#8221;/og
s/'/#8217;/og # Will also replace apostrophes with close curly
single - not a bad thing.


Sorry; I cannot agree.
Every Latin-based charset encoding has an apostrophe character.
A curly-quote is most definitely *not* logically an apostrophe, even
though it may look like one.


I acknowledge that this is a matter of style---but the unicode standard  
discusses this and generally prefers the use of the curly single  
(#2019) to the straight mark (#0027)

http://www.unicode.org/unicode/reports/tr8/ 
#Apostrophe%20Semantics%20Errata

snip

The aim of an HTML translation should not be appearance.
It should be ensuring that meaning is preserved, and that no symbol
is rendered with the 'missing character' glyph.


I think one might reasonably disagree that appearance is not  
important---HTML is, intentions notwithstanding, a format used for  
presentation.  Your point and care is about the 'missing character'  
glyph is well taken, the warnings are very useful for this.

The 'div' request for CSS in Hakan's email also reflects the use of  
HTML as an appearance format.

Thanks,
James
Hope this helps,

Ross Moore

Thanks,
James
On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:

On Sat, 9 Aug 2003, James Howison wrote:

Hi all,

I'd really like to convert the latex quotation marks, `` and '' to  
the
recommended HTML curly quotes, #8220 instead of `` and #8221  
instead
of '' - standard codes that render the curly quotes beautifully.


set
 $USE_CURLY_QUOTES =1;
in an initialisation file.
This is not the default, because not all browsers actually render
these characters. (At least, that was the situation 3-4 years ago  when
the LaTex2HTML coding was written.)
Hope this helps,

Ross Moore


I'm sure that this is possible through latex2html - the codes are
listed around unicode.pl:722 - but either I can't find the magic
incantation to have latex2html do the conversion or there is a bug
preventing this from working in my version (1.70) or set-up.
I've tried:

latex2html -html_version 4.0,unicode test.tex

What is strange is that this does work for, say \v{Z} which converts
to
the code #381

[l2h] Converting emdashs and endashs?

2003-08-14 Thread James Howison
Now I have curly quotes happening (yay!) I am wondering about the other 
special characters.  I realize that this will break back-wards 
compatibility but that is not an issue for my needs.

I would like --- to be converted to #8212; as defined in the 
unicode.pl file at 799 - but this doesn't seem to happen - instead it 
is converted to --.  This is also what happens if I change --- to 
{---}.

I'm not sure why some of the conversions in the unicode.pl file happen, 
while others do not.  I can't find an equivalent of the 
$USE_CURLY_QUOTES in the source code that seems relevant to mdash ...

Any ideas on how to get a maximal set of the conversions in unicode.pl 
actually happening?  I notice that there is no do_cmd_textemdash in 
unicode.pl - is that why?

Also I see from the source that converting single quotes is 
tough---perhaps I'm naive but it would seem to me that this sequence 
would work...

s/``/#8220;/og
s/`/#8216;/og # once the `` is gone then the ` is only used for 
open single quote right?
s/''/#8221;/og
s/'/#8217;/og # Will also replace apostrophes with close curly 
single - not a bad thing.

i.e. ensure that one does the singles after the doubles ...

But there is probably a better algorithm in the source code for 'quoter'

http://www.dwheeler.com/quoter/

Thanks,
James
On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:

On Sat, 9 Aug 2003, James Howison wrote:

Hi all,

I'd really like to convert the latex quotation marks, `` and '' to the
recommended HTML curly quotes, #8220 instead of `` and #8221 instead
of '' - standard codes that render the curly quotes beautifully.
set
 $USE_CURLY_QUOTES =1;
in an initialisation file.
This is not the default, because not all browsers actually render
these characters. (At least, that was the situation 3-4 years ago when
the LaTex2HTML coding was written.)
Hope this helps,

	Ross Moore


I'm sure that this is possible through latex2html - the codes are
listed around unicode.pl:722 - but either I can't find the magic
incantation to have latex2html do the conversion or there is a bug
preventing this from working in my version (1.70) or set-up.
I've tried:

latex2html -html_version 4.0,unicode test.tex

What is strange is that this does work for, say \v{Z} which converts 
to
the code #381; (and that is definitely happening through unicode.pl 
(I
changed the translation and it worked fine).

So why doesn't the translation for `` (which is correctly listed in 
the
unicode.pl as \`\`) and '' which is correctly listed as \'\' work?

I've had a good hunt around for this - but I can't see why the other
codes are converted but not the quotes.
Cheers,
James
ps.  minimal test.tex follows

--

\documentclass[11pt]{article}
\begin{document}
``Why are these quotes not converted to unicode''  (they are in the
unicode.pl file)
While this symbol (also in the unicode.pl file) is? - \v{Z}
\end{document}
___
latex2html mailing list
[EMAIL PROTECTED]
http://tug.org/mailman/listinfo/latex2html

___
latex2html mailing list
[EMAIL PROTECTED]
http://tug.org/mailman/listinfo/latex2html


Re: [l2h] Converting emdashs and endashs?

2003-08-11 Thread Ross Moore

Hello James,

On Mon, 11 Aug 2003, James Howison wrote:

 Now I have curly quotes happening (yay!) I am wondering about the other
 special characters.  I realize that this will break back-wards
 compatibility but that is not an issue for my needs.

 I would like --- to be converted to #8212; as defined in the
 unicode.pl file at 799 - but this doesn't seem to happen - instead it
 is converted to --.  This is also what happens if I change --- to
 {---}.

That is definitely a lot harder; particularly since -- and --- are
rarely used correctly in LaTeX manuscripts.
So general rules may easily result in something that the author
never intended.

 I'm not sure why some of the conversions in the unicode.pl file happen,
 while others do not.  I can't find an equivalent of the
 $USE_CURLY_QUOTES in the source code that seems relevant to mdash ...

 Any ideas on how to get a maximal set of the conversions in unicode.pl
 actually happening?  I notice that there is no do_cmd_textemdash in
 unicode.pl - is that why?

 Also I see from the source that converting single quotes is
 tough---perhaps I'm naive but it would seem to me that this sequence
 would work...

 s/``/#8220;/og
 s/`/#8216;/og # once the `` is gone then the ` is only used for
 open single quote right?

Not at all.  \`  is used as an accent, and in some language variants,
the ` is made active to remove the need to use the \ .
With this active character, overloading can occur for generating
other special characters or ligatures.


 s/''/#8221;/og
 s/'/#8217;/og # Will also replace apostrophes with close curly
 single - not a bad thing.

Sorry; I cannot agree.
Every Latin-based charset encoding has an apostrophe character.
A curly-quote is most definitely *not* logically an apostrophe, even
though it may look like one.

Try cut/paste from a web-page into LaTeX source.
Simply finding curly quotes to replace with apostrophes is *very* tedious
indeed. At least when quotes occur in pairs then you expect to have
to do something with the environment delimiters --- it should *not*
be necessary to have to search/replace apostrophes.


 i.e. ensure that one does the singles after the doubles ...

 But there is probably a better algorithm in the source code for 'quoter'

 http://www.dwheeler.com/quoter/

The aim of an HTML translation should not be appearance.
It should be ensuring that meaning is preserved, and that no symbol
is rendered with the 'missing character' glyph.


Hope this helps,

Ross Moore


 Thanks,
 James


 On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:

  On Sat, 9 Aug 2003, James Howison wrote:
 
  Hi all,
 
  I'd really like to convert the latex quotation marks, `` and '' to the
  recommended HTML curly quotes, #8220 instead of `` and #8221 instead
  of '' - standard codes that render the curly quotes beautifully.
 
  set
   $USE_CURLY_QUOTES =1;
  in an initialisation file.
 
  This is not the default, because not all browsers actually render
  these characters. (At least, that was the situation 3-4 years ago when
  the LaTex2HTML coding was written.)
 
 
  Hope this helps,
 
  Ross Moore
 
 
 
  I'm sure that this is possible through latex2html - the codes are
  listed around unicode.pl:722 - but either I can't find the magic
  incantation to have latex2html do the conversion or there is a bug
  preventing this from working in my version (1.70) or set-up.
 
  I've tried:
 
  latex2html -html_version 4.0,unicode test.tex
 
  What is strange is that this does work for, say \v{Z} which converts
  to
  the code #381; (and that is definitely happening through unicode.pl
  (I
  changed the translation and it worked fine).
 
  So why doesn't the translation for `` (which is correctly listed in
  the
  unicode.pl as \`\`) and '' which is correctly listed as \'\' work?
 
  I've had a good hunt around for this - but I can't see why the other
  codes are converted but not the quotes.
 
  Cheers,
  James
 
  ps.  minimal test.tex follows
 
  --
 
  \documentclass[11pt]{article}
  \begin{document}
  ``Why are these quotes not converted to unicode''  (they are in the
  unicode.pl file)
  While this symbol (also in the unicode.pl file) is? - \v{Z}
  \end{document}
 
  ___
  latex2html mailing list
  [EMAIL PROTECTED]
  http://tug.org/mailman/listinfo/latex2html
 
 

 ___
 latex2html mailing list
 [EMAIL PROTECTED]
 http://tug.org/mailman/listinfo/latex2html

___
latex2html mailing list
[EMAIL PROTECTED]
http://tug.org/mailman/listinfo/latex2html