Re: converting UTF-8 to HTML

2012-04-22 Thread Lars Eighner
On Sun, 22 Apr 2012, Matthew Seaman wrote: On 22/04/2012 10:17, Erik Nørgaard wrote: UTF-8 is variable with, ascii characters are stored as single bytes (not sure about iso-8859-1) while other characters are stored as two byte chars. ascii uses the low 128 values that you can assign to an uns

Re: converting UTF-8 to HTML

2012-04-22 Thread Erik Nørgaard
On 22/04/2012 13:06, Polytropon wrote: How about the "extended ASCII character set" that has a mixture of "non-US glyphs" and semi-graphic symbols? http://asciiset.com/extended.gif I can't even write my name in that character set. As long as there are multiple charactersets you will

Re: converting UTF-8 to HTML

2012-04-22 Thread Matthew Seaman
On 22/04/2012 12:06, Polytropon wrote: > How about the "extended ASCII character set" that has a mixture > of "non-US glyphs" and semi-graphic symbols? > > http://asciiset.com/extended.gif > > This default layout isn't tied to a specific encoding, if I > remember correctly, or is it? Access

Re: converting UTF-8 to HTML

2012-04-22 Thread Polytropon
On Sun, 22 Apr 2012 11:45:45 +0100, Matthew Seaman wrote: > On 22/04/2012 10:17, Erik Nørgaard wrote: > > UTF-8 is variable with, ascii characters are stored as single bytes (not > > sure about iso-8859-1) while other characters are stored as two byte chars. > > ascii uses the low 128 values that

Re: converting UTF-8 to HTML

2012-04-22 Thread Matthew Seaman
On 22/04/2012 10:17, Erik Nørgaard wrote: > UTF-8 is variable with, ascii characters are stored as single bytes (not > sure about iso-8859-1) while other characters are stored as two byte chars. ascii uses the low 128 values that you can assign to an unsigned char, ie. those where the high-order b

Re: converting UTF-8 to HTML

2012-04-22 Thread Erik Nørgaard
On 21/04/2012 16:10, Lars Eighner wrote: UTF-8 is a waste of storage for most people and is incompatiple with text-mode tools: it's simple another bid to make it impossible to run without a GUI. UTF-8 is variable with, ascii characters are stored as single bytes (not sure about iso-8859-1) wh

Re: converting UTF-8 to HTML

2012-04-21 Thread Robert Bonomi
Polytropon wrote: > On Sat, 21 Apr 2012 09:10:03 -0500 (CDT), Lars Eighner wrote: > > On Sat, 21 Apr 2012, Erik Nurgaard wrote: > > > > > When characters show up wrong in the users browser it's usually > > > because the browser is set to use a non-UTF-8 charset by default > > > such as windows-1

Re: converting UTF-8 to HTML

2012-04-21 Thread Polytropon
On Sat, 21 Apr 2012 09:10:03 -0500 (CDT), Lars Eighner wrote: > On Sat, 21 Apr 2012, Erik Nørgaard wrote: > > > When characters show up wrong in the users browser it's usually because the > > browser is set to use a non-UTF-8 charset by default such as windows-1252, > > the web server sends the

Re: converting UTF-8 to HTML

2012-04-21 Thread Lars Eighner
On Sat, 21 Apr 2012, Erik Nørgaard wrote: When characters show up wrong in the users browser it's usually because the browser is set to use a non-UTF-8 charset by default such as windows-1252, the web server sends the charset=ascii in the http header and there is no or incorrect meta tag to re

Re: converting UTF-8 to HTML

2012-04-21 Thread Matthias Apitz
El día Saturday, April 21, 2012 a las 11:06:42AM +0200, Erik Nørgaard escribió: > On 21/04/2012 08:29, Erik Nørgaard wrote: > > Browsers understand UTF-8 perfectly, simply add > > to the html header. > > Obviously I can't know what your project is, but you'll save yourself > heaps of problems s

Re: converting UTF-8 to HTML

2012-04-21 Thread Erik Nørgaard
On 21/04/2012 08:29, Erik Nørgaard wrote: Browsers understand UTF-8 perfectly, simply add to the html header. Obviously I can't know what your project is, but you'll save yourself heaps of problems sticking to UTF-8, in particular if you plan on implementing any search functionality or have

Re: converting UTF-8 to HTML

2012-04-21 Thread Matthias Apitz
El día Saturday, April 21, 2012 a las 07:34:44AM +0100, Matthew Seaman escribió: > www/tidy-devel > > (which is effectively a fork of the original www/tidy project, and has > quite a lot of new functionality) > > If you specify 'ascii' for the output format, it should generate > appropriate char

Re: converting UTF-8 to HTML

2012-04-20 Thread Erik Nørgaard
On 21/04/2012 07:58, Matthias Apitz wrote: Is there something in the port to convert UTF-8 text to HTML encondings, like: $ echo ü | iconv -f utf-8 -t html ü of the encondings in hex based on the codepoint? AFAIK it's not possible. Browsers understand UTF-8 perfectly, simply add to the ht

Re: converting UTF-8 to HTML

2012-04-20 Thread Matthew Seaman
On 21/04/2012 06:58, Matthias Apitz wrote: > Is there something in the port to convert UTF-8 text to HTML encondings, > like: > > $ echo ü | iconv -f utf-8 -t html > ü > > of the encondings in hex based on the codepoint? www/tidy-devel (which is effectively a fork of the original www/tidy proje

converting UTF-8 to HTML

2012-04-20 Thread Matthias Apitz
Hello, Is there something in the port to convert UTF-8 text to HTML encondings, like: $ echo ü | iconv -f utf-8 -t html ü of the encondings in hex based on the codepoint? Thanks matthias -- Matthias Apitz t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211 e - w http://www.u