> I propose that we make the soft hyphen character into a function rather
> than an actual character, and remove support for 0xA5 as a soft
> hyphen.  Or am I misunderstanding something in the code?

The current code mainly just tries to pass bytes along, so that it
will work for various character set encodings (like BIG-5) that it
doesn't understand (which are basically all of them).  It assumes that
the viewer will have the same character set encoding.  It also stashes
the character set in use in the metadata record, if it can figure out
what it is.  So a viewer could check that for each Plucker record, and
try to do something smart if the charset didn't match the viewer or
Palm charset (if you can figure out what the Palm charset is).

So there is no "support for 0xA5" in the code (and nothing to remove);
what you've got is a mismatch between the charset in the original and
the charset in the viewer.

Again, if we switched to Python 2.0, which has good charset handling
features, and if we can figure out what charset is in use in the
source page, which with HTML is always problematic, and if we knew
more about the charset models actually used by various viewers, we
could do more.

Bill
_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to