Re: plucker & unicode

Radovan Garabik Thu, 12 Feb 2004 00:11:37 -0800

Just to clarify what I had in mind - I was not speaking about "full"
unicode support (whatever you mean by "full"), but rather about UCS2
without BIDI and combining characters - i.e. support that is already
in plucker, but is not used because of some missing implementation
details. And it is all about unicode with grayscale fonts, not with
system ones.

On Wed, Feb 11, 2004 at 09:54:15AM -0500, Alexander R. Pruss wrote:
> On Wed, 11 Feb 2004, Radovan Garabik wrote:
> > I know, but that does not solve the original problem - if the page
> > I want to pluck uses e.g. iso-8859-2 repertoire and it has a link to page that
> > uses iso-8859-1 (or koi8-r or anything, or even worse, characters from more
> > codepages). As I said, support in plucker is _almost_ there and I am willing 
> > to work on it - but only if there is an interest in developpers' comunity 
> > to include the support in plucker.
> 
> One concern is that if we roll our own solution for this, then along may
> come a future OS version with native Unicode support.  We could just wait
> for that.  In fact, I think that if a future OS version with native
> Unicode support comes along, we're basically ready.  Plucker has full
> multi-byte char support.  If the OS uses that mechanism for unicode, it'll
> be just a matter of updating the parser.

Yes, but multibyte text storage (and btw the one in plucker is crippled and
won't be usable because of SHIFT-JIS (mis)handling) is terribly inefficient 
(for non-CJK languages), you want UTF-8 (or SCSU), and at least UTF-8 is very 
easy to implement.
Besides, grayscale font renderer already does support unicode (UCS2), so
eventual OS support is irrelevant (if you are using grayscale fonts, not
the system ones)
Anyway, even if some future OS makes some sort of unicode support,
1) users of older devices are not going to flash their ROMs en mass
2) grayscale fonts have still nothing to do with OS, so they would
   need this implementation (what I am talking about) anyway

> 
> The number of users who need to pluck sites that use multiple code pages
> in ways that matter.  For most English pages, for instance, it is no
> disaster if you pretend the page is in some other encoding--at most a few
> characters will be mixed up--and one expects that most, though not all,
> multilanguage plucks are going to be Some Other Language plus English,
> rather than two non-English languages. 

I agree. However, it renders plucker unusable for me :-)
And there is non negligible number of users who want
combination "their language" + "some other language"
(basically anyone who is studying foreign languages that do not
fall into the same ISO group as their native one)

> So I am not sure that it is worth
> supporting this.  It may slow down rendering.  It will make maintenance
> more work for all the developers.

Note that rendering is already unicode (UCS2). UTF-8 implementation
would slow just text parsing (but UTF-8 parser is something like
6 lines of C code consisting of bitshift and AND operations)

> It will make maintenance more work for all the developers.

Yes, as every new feature does.

Well, OK, I clarified my position. If you feel that plucker is
not ready for this, I am not going to start a fork :-)
It would be just pity, since I am really missing a good
multilanguage reader (there is commercial UReader, but it is only 
a simple text reader, a and rather expensive one)

-- 
 -----------------------------------------------------------
| Radovan Garab�k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: plucker & unicode

Reply via email to