Just to clarify what I had in mind - I was not speaking about "full" unicode support (whatever you mean by "full"), but rather about UCS2 without BIDI and combining characters - i.e. support that is already in plucker, but is not used because of some missing implementation details. And it is all about unicode with grayscale fonts, not with system ones.
On Wed, Feb 11, 2004 at 09:54:15AM -0500, Alexander R. Pruss wrote: > On Wed, 11 Feb 2004, Radovan Garabik wrote: > > I know, but that does not solve the original problem - if the page > > I want to pluck uses e.g. iso-8859-2 repertoire and it has a link to page that > > uses iso-8859-1 (or koi8-r or anything, or even worse, characters from more > > codepages). As I said, support in plucker is _almost_ there and I am willing > > to work on it - but only if there is an interest in developpers' comunity > > to include the support in plucker. > > One concern is that if we roll our own solution for this, then along may > come a future OS version with native Unicode support. We could just wait > for that. In fact, I think that if a future OS version with native > Unicode support comes along, we're basically ready. Plucker has full > multi-byte char support. If the OS uses that mechanism for unicode, it'll > be just a matter of updating the parser. Yes, but multibyte text storage (and btw the one in plucker is crippled and won't be usable because of SHIFT-JIS (mis)handling) is terribly inefficient (for non-CJK languages), you want UTF-8 (or SCSU), and at least UTF-8 is very easy to implement. Besides, grayscale font renderer already does support unicode (UCS2), so eventual OS support is irrelevant (if you are using grayscale fonts, not the system ones) Anyway, even if some future OS makes some sort of unicode support, 1) users of older devices are not going to flash their ROMs en mass 2) grayscale fonts have still nothing to do with OS, so they would need this implementation (what I am talking about) anyway > > The number of users who need to pluck sites that use multiple code pages > in ways that matter. For most English pages, for instance, it is no > disaster if you pretend the page is in some other encoding--at most a few > characters will be mixed up--and one expects that most, though not all, > multilanguage plucks are going to be Some Other Language plus English, > rather than two non-English languages. I agree. However, it renders plucker unusable for me :-) And there is non negligible number of users who want combination "their language" + "some other language" (basically anyone who is studying foreign languages that do not fall into the same ISO group as their native one) > So I am not sure that it is worth > supporting this. It may slow down rendering. It will make maintenance > more work for all the developers. Note that rendering is already unicode (UCS2). UTF-8 implementation would slow just text parsing (but UTF-8 parser is something like 6 lines of C code consisting of bitshift and AND operations) > It will make maintenance more work for all the developers. Yes, as every new feature does. Well, OK, I clarified my position. If you feel that plucker is not ready for this, I am not going to start a fork :-) It would be just pity, since I am really missing a good multilanguage reader (there is commercial UReader, but it is only a simple text reader, a and rather expensive one) -- ----------------------------------------------------------- | Radovan Garab�k http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread! _______________________________________________ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
