Unicode patch 3

Radovan Garabik Wed, 17 Mar 2004 08:45:52 -0800

On Sat, Mar 13, 2004 at 08:15:43PM -0500, Alexander R. Pruss wrote:
> > you can select anything as --output-charset, characters from
> > --input-charset that cannot be represented in --output-charset
> > are included as unicode values - this is why default --output-charset
> > is ascii, rather than palmos.
> 
> Having default output charset not be palmos will make searching
> significantly less efficient, especially on ARM units.  Searching is
> optimized for 8-bit text.  In fact, it currently doesn't work for unicode at
> all (unless your patch fixes that).


It does not, and this is a valid point. So it is back to palmos.

On Sun, Mar 14, 2004 at 12:10:19PM +0100, Michael Nordstrom wrote:
> On Thu, Mar 11, 2004, Radovan Garabik wrote:
> 
> > New version of plucker unicode patch.
> 
> It is great that you add support for this, but in what way is this
> related to the libraryform enhancements? Please don't use a message
> from a completely different thread as a "template". Usually, it is a
> common feature to create a new message in a mail client, so I doubt
> it can be that hard for you to create a new message with a correct
> *subject* ;-)

sorry, I messed up (like, going to lunch and being already late
and trying to finish up the mail *quickly* :-))


New version of unicode patch is attached. This is combined patch,
for both parser and viewer.
Changes:
- deafult output charset for parser is palmos
- FindPalmCharForUnicodeChar is back if you are not using
  gray fonts - good for backward compatibility
- unicode patch is now compatible with word lookup (as far as it can
  be), by using TxtGlue* where appropriate - so you can use word
  lookup regardless of unicode mode or legacy encoding of the document,
  as long as the characters fit into your palm charset (that is,
  PalmOS cp1252 hybrid, but in theory it should work with multibyte
  japanese devices as well, I just have no way of testing it).
  If the character does _not_ fit into palm encoding, you get
  whatever TxtGlue* gives - i.e. mess-up.


To test the patch:
- get unicode font from 
http://kassiopeia.juls.savba.sk/~garabik/plucker/unicode_test.prc.gz
(warning: it is 402KB ungzipped). The font has glyphs up to U+11F9 and it
is a hires font.
- get test document from 
http://kassiopeia.juls.savba.sk/~garabik/plucker/sklo.pdb


The text uses some combining characters (I could not make
font with glyphs going up to the range of extended greek chars, so I
broke it down into NFD normalization, but it nicely shows the usage of 
combining characters as well). As long as source font actually
contains combining characters with negative kerning, they are going 
to be displayed OK (not perfectly, but acceptably) if the preceeding
character has not abnormal width. In the example I am giving, 
iota is obviously too thin and following perispomeni hangs over previous 
epsilon. Also, psili and oxia do not look good at all when stacked both
over one letter.





-- 
 -----------------------------------------------------------
| Radovan Garab�k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

diff -ur plucker-original/configure.in plucker-new/configure.in
--- plucker-original/configure.in       2004-03-07 13:00:33.000000000 +0100
+++ plucker-new/configure.in    2004-03-08 17:09:20.000000000 +0100
@@ -320,6 +320,7 @@
                           to get the function names included in POSE's
                           profiling output ])
 AC_ARG_ENABLE(imode, [  --enable-imode          to enable i-mode support (also 
requires the imodeicons.pdb database)])
+AC_ARG_ENABLE(unicode, [  --enable-unicode        to enable unicode support])
 AC_ARG_ENABLE(scroll_to_bottom, [  --disable-scroll-to-bottom
                           always scroll even pages instead of stopping when
                           the end of the page is reached (will add some extra
diff -ur plucker-original/parser/python/PyPlucker/Spider.py 
plucker-new/parser/python/PyPlucker/Spider.py
--- plucker-original/parser/python/PyPlucker/Spider.py  2004-02-02 03:31:58.000000000 
+0100
+++ plucker-new/parser/python/PyPlucker/Spider.py       2004-03-10 19:52:53.000000000 
+0100
@@ -1276,8 +1276,10 @@
         message(0, "                   Set or clear the backup bit in the output 
file.")
         message(0, "    --beamable, --not-beamable:")
         message(0, "                   Set or clear the beamable bit in the output 
file.")
-        message(0, "    --charset=<name>:")
-        message(0, "                   Set the default charset to that specified by 
<name>.")
+        message(0, "    --output-charset=<name>:")
+        message(0, "                   Set the output charset of generated document 
to that specified by <name>.")
+        message(0, "    --input-charset=<name>:")
+        message(0, "                   Assume input charset to that specified by 
<name>.")
         message(0, "    --owner-id=<name>:")
         message(0, "                   Set owner-id of the output document to 
<name>.")
         message(0, "    --url-pattern=<regexp-pattern>:")
@@ -1350,7 +1352,8 @@
         backup = None
         copy_protect = None
         iconfile = None
-        default_charset = None
+        output_charset = None
+        input_charset = None
         owner_id = None
         url_pattern = None
         referrer = None
@@ -1376,7 +1379,7 @@
                                         "maxheight=", "maxwidth=", "alt-maxheight=", 
"alt-maxwidth=",
                                         "compression=", "home-url=", "update-cache", 
"launchable",
                                         "not-launchable", "backup", "no-backup", 
"beamable", "not-beamable",
-                                        "icon=", "charset=", "owner-id=", 
"url-pattern=", "referrer=",
+                                        "icon=", "output-charset=", "input-charset=", 
"owner-id=", "url-pattern=", "referrer=",
                                         "user-agent=", "title=", "author=", 
"status-file=", "version",
                                         "tables", "depth-first", "http-proxy=", 
"http-proxy-user=", "http-proxy-pass=",
                                         "fragments=", "creator-id="])
@@ -1494,8 +1497,10 @@
                 copy_protect = 1
             elif opt == "--icon":
                 iconfile = arg
-            elif opt == "--charset":
-                default_charset = arg
+            elif opt == "--output-charset":
+                output_charset = arg
+            elif opt == "--input-charset":
+                input_charset = arg
             elif opt == "--owner-id":
                 owner_id = arg
             elif opt == "--referrer":
@@ -1602,21 +1607,19 @@
         if zlib_compression == 'false':
             message('Specification of an owner-id forces use of zlib compression...')
         zlib_compression = 'true'
-        
-    mibenum = None
-    # if not specified on command line, look in .pluckerrc
-    if default_charset is None:
-        default_charset = config.get_string("default_charset")
-    # if we have one, validate it
-    if default_charset is not None:
-        from PyPlucker.helper.CharsetMapping import charset_name_to_mibenum, 
charset_known_names
-        import string, re
-        mibenum = charset_name_to_mibenum(default_charset)
-        if mibenum:
-            config.set('default_charset', mibenum)
-        else:
-            usage ("Error:  Unsupported charset '" + default_charset + "' specified 
as default charset.\n"
-                   "        Charset must be either a decimal MIBenum value, or one of 
" + str(charset_known_names()))
+
+    if output_charset is None:
+        output_charset = config.get_string("output_charset")
+    if output_charset is None:
+        output_charset = 'palmos'
+    config.set ('output_charset', output_charset)
+
+    if input_charset is None:
+        input_charset = config.get_string("input_charset")
+    if output_charset is None:
+        input_charset = 'utf-8'        
+    config.set ('input_charset', input_charset)
+
 
     # update the config with the user options
     if use_file is not None:
@@ -1696,8 +1699,6 @@
         config.set ('author_md', author)
     if title is not None:
         config.set ('title_md', title)
-    if mibenum is not None:
-        config.set ('default_charset', mibenum)
     if statusfile is not None:
         config.set ('status_file', statusfile)
     if depthfirst is not None:
diff -ur plucker-original/parser/python/PyPlucker/TextParser.py 
plucker-new/parser/python/PyPlucker/TextParser.py
--- plucker-original/parser/python/PyPlucker/TextParser.py      2004-02-27 
23:51:08.000000000 +0100
+++ plucker-new/parser/python/PyPlucker/TextParser.py   2004-03-11 19:46:50.000000000 
+0100
@@ -31,6 +31,8 @@
 ## Now PyPlucker things should generally be importable
 ##
 
+NBSP = u'\u00a0' # non-breaking space
+
 import string
 import re
 try:
@@ -336,6 +338,33 @@
 _entitycharref = re.compile('^(.*)&([#a-zA-Z][-.a-zA-Z0-9]*);(.*)$')
 _html_char_ref_pattern = re.compile('^&#([0-9]+);$')
 
+# this needs to be rewritten
+def text_alternative (uchar):
+    "get text alternative to unicode character uchar"
+    val = ord(uchar)
+    if val == 8211:
+        return "-"
+    elif val == 8212:
+        return "--"
+    elif val == 8216:
+        return "`"
+    elif val == 8217:
+        return "'"
+    elif val == 8220:
+        return "\""
+    elif val == 8230:
+        return "..."
+    elif val == 8221:
+        return "\""
+    elif val == 8226:
+        return "o"
+    elif val == 8482:
+        return "(tm)"
+    else:
+        return "?"
+        return "&#%d;" % val
+
+
 # These junk "alt" attribute values are not worth showing.
 junk_alt_attributes = ("img", "[img]", "spacer", "")
 
@@ -374,8 +403,6 @@
     return text
 
 
-
-
 class AttributeStack:
     """A data structure to maintain information about the current
     text attributes.
@@ -525,12 +552,11 @@
         return self._tags[self._stack[-1]]
 
 
-
-
 class TextDocBuilder:
     """Encapsulate the knowledge of when to change styles, add paragraphs, etc."""
 
     def __init__ (self, url, config, **keyword_args):
+        message(2,"initializing textdocbuilder")
         self._doc = PluckerDocs.PluckerTextDocument (url)
         self._config = config
         self._attributes = AttributeStack ()
@@ -582,19 +608,12 @@
             # see if we can supply a default charset
             url = self._doc.get_url()
             if self._config:
-                userspec = self._config.get_int('default_charset', 0)
+                userspec = self._config.get_int('output_charset_mibenum', 0)
             else:
                 userspec = None
             locale_default = charset_name_to_mibenum(DEFAULT_LOCALE_CHARSET_ENCODING)
-            # the userspec will take precedence
-            if userspec:
+            if userspec is not None:
                 self._doc.set_charset(userspec)
-            # OK, so we have no idea.  Use the HTTP default of ISO-8859-1 (4) for
-            # http: URLs, and the environment default (if any) for others
-            elif (string.lower(url[:5]) == 'http:' or string.lower(url[:6]) == 
'https:'):
-                self._doc.set_charset(4)
-            elif locale_default:
-                self._doc.set_charset(locale_default)
 
     def add_name (self, name):
         """Give name to the current paragraph"""
@@ -875,7 +894,28 @@
         
 
     def add_text (self, text):
-        """Add some text, maybe even many lines."""
+        """Add some text, maybe even many lines.
+            Text can be either a string or a unicode string.
+        """
+
+        def add_unicode_text(paragraph, text):
+            if type(text)==type(""): # non-unicode string, shortcut
+                message(4, "Adding 8-bit text")
+                paragraph.add_text(text)
+            elif type(text)==type(u""):
+                message(4, "Adding Unicode text")
+                for c in text:
+                    if ord(c)<128:
+                        paragraph.add_text(str(c))
+                    else:
+                        try:
+                            outc = c.encode(self._config.get_string("output_charset"))
+                            paragraph.add_text(outc)
+                        except UnicodeError:
+                            paragraph.add_unicode_char(ord(c), text_alternative(c))
+            else:
+                raise "Unexpected text type"
+
         lines = string.split (text, "\n")
         for i in range (len (lines)):
             line = lines[i]
@@ -891,7 +931,7 @@
                 if rest_size < 0:
                     rest_size = 0
                 (first, rest) = self._find_text_split (line, rest_size)
-                self._paragraph.add_text (first)
+                add_unicode_text(self._paragraph, first)
                 self._approximate_size = self._approximate_size + len (first)
                 self._is_new_paragraph = 0
                 self._is_new_line = 0
@@ -901,7 +941,7 @@
                     break
             
             if line:
-                self._paragraph.add_text (line)
+                add_unicode_text(self._paragraph, line)
                 self._approximate_size = self._approximate_size + len (line)
                 self._is_new_paragraph = 0
                 self._is_new_line = 0
@@ -963,12 +1003,17 @@
 
     def __init__ (self, url, text, headers, config, attribs):
         text = _clean_newlines (text)
+        textcharset = config.get_string("input_charset")
         # This we use to build the document
         self._doc = TextDocBuilder (url, config)
         if headers.has_key("charset"):
-            self._doc.set_charset (headers["charset"])
+            textcharset = headers["charset"]
         elif attribs.has_key("charset"):
-            self._doc.set_charset (attribs["charset"])
+            textcharset = attribs["charset"]
+        if not textcharset: # we have no idea, so we use locale
+            textcharset = DEFAULT_LOCALE_CHARSET_ENCODING 
+        text = unicode(text, textcharset)
+        message(4, "PlainTextParser: converting into unicode from "+textcharset)
         self._url = url
         self._text = text
         # In these two lists we store tuples of (url, attributes) for encountered 
anchors
@@ -1060,9 +1105,11 @@
         # javascript:document.write("<div>") turns it back on, because
         # it only recognizes the div, not the javascript.
         self._visible = 1
-        self._charset = headers.has_key('charset') and 
charset_name_to_mibenum(headers['charset'])
-        if self._charset:
-            self._doc.set_charset(headers['charset'])
+        # charset (python name of it) of current document - first: default
+        self.html_charset = config.get_string("input_charset")
+        # second: from headers
+        if headers.has_key('charset'):
+            self.html_charset = headers['charset']
         # Since some users are really stupid and use HTML wrong, we need a
         # stack of these values
         self._visibility_stack = []
@@ -1153,8 +1200,8 @@
         # we can only check the charset specified in the attribs after parsing
         # the document for <META> tags.  Seems kind of backward, but that's the
         # HTML spec.
-        if not self._charset and self._attribs.has_key('charset'):
-            self._set_charset(self._attribs['charset'])
+        #if not self._charset and self._attribs.has_key('charset'):
+        #    self._set_charset(self._attribs['charset'])
         self._doc.close ()
 
     def get_plucker_doc (self):
@@ -1300,7 +1347,8 @@
         _add_vspace() to do that explicitly if you want to."""
         if self._visible:
             if self.atable is not None and self.in_cell:
-                self.atable.add_cell_text (text)
+                if type(text)==type(""):
+                    self.atable.add_cell_text (text)
             else:
                 self._doc.add_text (text)
                 self._element_beginning = 0
@@ -1392,9 +1440,8 @@
             self._visible = 1
 
     def _set_charset (self, charset):
-        if charset_name_to_mibenum(charset):
-            self._charset = charset
-            self._doc.set_charset(charset)
+        message(4, "Setting html charset to "+charset)
+        self.html_charset = charset
 
     ################################################################################
     ######## HTML specifics
@@ -1430,9 +1477,10 @@
 
 
     def do_meta (self, data):
-        # if the charset is not already assigned (from the HTTP headers, presumably)
-        # and it's available here, then use it
-        if not self._charset and string.lower(data[0][0]) == 'http-equiv' and 
string.lower(data[0][1]) == 'content-type':
+        # if the charset is specified here, use it
+        # this is against html specs (headers have precedence), but
+        # conforms to common usage and is easier to program :-)
+        if string.lower(data[0][0]) == 'http-equiv' and string.lower(data[0][1]) == 
'content-type':
             from PyPlucker.Retriever import parse_http_header_value
             ctype, parameters = parse_http_header_value(data[1][1])
             for parameter in parameters:
@@ -1446,10 +1494,7 @@
         except ValueError:
             self.unknown_entityref(name)
             return
-        if not 0 <= n <= 255:
-            self.unknown_charref(name)
-            return
-        self.handle_data(chr(n))
+        self.handle_data(unichr(n))
 
 
     def handle_special (self, name):
@@ -1478,7 +1523,8 @@
             data = string.translate (data, _CLEANUP_TRANSTABLE)
             data = string.replace (data, "\t", "  ")
 
-
+        if type(data)==type(""):
+            data = unicode(data, self.html_charset or 'iso8859_1')
         #stripped_data = string.strip(data)
         if data:
             # not just blank or empty text (e.g. from comments), so we
@@ -1522,8 +1568,8 @@
                         style_str = struct.pack (">BB", 0, 0x78)
                     self.atable.add_cell_text(style_str)
                     self.last_table_strike = new_strike
-
-            self._add_text (data)
+            self._add_text(data)
+            message(4, "handling data "+`data`)
 
 
     def start_body (self, attributes):
@@ -1886,7 +1932,8 @@
     def do_p (self, attributes):
         if self._needs_newpara ():
             if self._indent_paragraphs:
-                self._add_text('\xa0\xa0\xa0\xa0\xa0\xa0')
+                #self._add_text('\xa0\xa0\xa0\xa0\xa0\xa0')
+                self._add_text(6*NBSP)
             else:
                 self._add_vspace (2)
 
@@ -2049,7 +2096,7 @@
                 text = ((0x2022, "o"), " ")
                 indent = 7
             elif self._ul_list_depth == 2:
-                text = chr(0xbb) + " "
+                text = unichr(0xbb) + " "
                 indent = 6
             elif self._ul_list_depth == 3:
                 text = "+ "
@@ -2063,15 +2110,15 @@
 
         self._doc.set_style ("")  # make sure we render the 'bullet' marker in normal 
style
         if self.atable is not None and self.in_cell:
-            self._add_text('\xa0\xa0' * table_margin)
+            self._add_text((2*NBSP) * table_margin)
             style_str = struct.pack (">BBBBB", 0, 0x53, 0, 0, 0) # black
             self.atable.add_cell_text(style_str)
 
-        if type(text) == type(""):
+        if type(text) == type("") or type(text) == type(u""):
             self._add_text (text)
         elif type(text) == type(()):
             for element in text:
-                if type(element) == type(""):
+                if type(element) == type("") or type(element) == type(u""):
                     self._add_text(element)
                 elif type(element) == type(()) and len(element) == 2:
                     self._add_unicode_char(element[0], element[1])
@@ -2367,31 +2414,6 @@
             if not self._unhandled_tags.has_key (tag):
                 self._unknown["</%s>"%tag] = 1
 
-    def unknown_charref (self, ref):
-        if self._visible:
-            val = int(ref)
-            if val == 8211:
-                self._add_unicode_char (val, "-")
-            elif val == 8212:
-                self._add_unicode_char (val, "--")
-            elif val == 8216:
-                self._add_unicode_char (val, "`")
-            elif val == 8217:
-                self._add_unicode_char (val, "´")
-            elif val == 8220:
-                self._add_unicode_char (val, "\"")
-            elif val == 8230:
-                        self._add_unicode_char (val, "...")
-            elif val == 8221:
-                self._add_unicode_char (val, "\"")
-            elif val == 8226:
-                # what's this?  Unbreakable space?
-                self._add_unicode_char (val, " ")
-            elif val == 8482:
-                self._add_unicode_char (val, "(tm)")
-            else:
-                self._unknown["charref-%s" % ref] = 1
-                self._add_unicode_char (val, "&#%d;" % val)
 
     def unknown_entityref (self, ref):
         if self._visible:
@@ -2399,14 +2421,11 @@
                 s = htmlentitydefs.entitydefs[ref]
                 if len(s) == 1:
                     val = ord(s)
-                    if (val >= 0xa0 and val < 0x100) or (val >= 0x00 and val < 0xFF):
-                        self.handle_data (s)
-                    else:
-                        self._add_unicode_char(val, "&#%d;" % val)
+                    self.handle_data(unichr(val))
                 else:
                     m = _html_char_ref_pattern.match(s)
                     if m:
-                        self.unknown_charref(m.group(1))
+                        self.handle_data(unichr(int(m.group(1))))
             else:
                 self._unknown["entityref-%s"%ref] = 1
                 self.handle_data('?')
diff -ur plucker-original/viewer/config.h.in plucker-new/viewer/config.h.in
--- plucker-original/viewer/config.h.in 2004-03-10 18:01:26.000000000 +0100
+++ plucker-new/viewer/config.h.in      2004-03-10 18:51:34.000000000 +0100
@@ -116,3 +116,6 @@
 
 /* Define if supporting word lookup */
 #undef SUPPORT_WORD_LOOKUP
+
+/* Define if using unicode mode support */
+#undef UNICODE_MODE
diff -ur plucker-original/viewer/configure.in plucker-new/viewer/configure.in
--- plucker-original/viewer/configure.in        2004-02-28 16:28:21.000000000 +0100
+++ plucker-new/viewer/configure.in     2004-03-08 17:09:20.000000000 +0100
@@ -31,6 +31,7 @@
 DEFAULT_SKINS=no
 DEFAULT_ARMLET=no
 DEFAULT_IMODE=no
+DEFAULT_UNICODE=no
 DEFAULT_CATEGORY=""
 DEFAULT_WAIT_ICON=bubble
 DEFAULT_LANG="en de cs it fr ja fo da zh_CN pl ru es tr th ca no"
@@ -418,6 +419,17 @@
     AC_DEFINE(HAVE_IMODE,, [ Define if using i-mode support])
 fi
 
+AC_MSG_CHECKING(--enable-unicode argument)
+AC_ARG_ENABLE(unicode, [  --enable-unicode          to enable unicode grayfont 
support],
+    UNICODE=yes, UNICODE=$DEFAULT_UNICODE)
+AC_MSG_RESULT($UNICODE)
+
+if test "$UNICODE" != "no"; then
+    AC_DEFINE(UNICODE_MODE,, [ Define if using unicode mode support])
+fi
+
+
+
 AC_ARG_DISABLE(scroll_to_bottom, [  --disable-scroll-to-bottom
                           always scroll even pages instead of stopping when
                           the end of the page is reached (will add some extra
@@ -784,6 +796,11 @@
 else
     echo "  I-mode Support:             disabled"
 fi
+if test "$UNICODE" != "no" ; then
+    echo "  Unicode Support:            enabled"
+else
+    echo "  Unicode Support:            disabled"
+fi
 if test "$AXXPAC" != "no" ; then
     echo "  AxxPac Support:             enabled"
 else
diff -ur plucker-original/viewer/const.h plucker-new/viewer/const.h
--- plucker-original/viewer/const.h     2004-02-28 16:28:21.000000000 +0100
+++ plucker-new/viewer/const.h  2004-03-08 17:09:20.000000000 +0100
@@ -101,3 +101,5 @@
 /* 3B 22 is a single character in JIS and Kuten */
 #define testDoubleByteJISKuten            0x3B22
 
+/* 04 00 is a single character in UTF-8 */
+#define testDoubleByteUTF8            0x0400
diff -ur plucker-original/viewer/grayfont.c plucker-new/viewer/grayfont.c
--- plucker-original/viewer/grayfont.c  2004-03-05 15:48:27.000000000 +0100
+++ plucker-new/viewer/grayfont.c       2004-03-09 20:01:28.000000000 +0100
@@ -26,6 +26,7 @@
 #include "prefsdata.h"
 #include "palmbitmap.h"
 #include "font.h"
+#include "debug.h"
 #define NO_GRAY_FONT_SUBSTITUTION
 #include "grayfont.h"
 
@@ -141,7 +142,6 @@
 
 
 
-
 /***********************************************************************
  *
  *      Private variables
@@ -167,6 +167,11 @@
     0x632c, 0x52aa, 0x4228, 0x3186, 0x2104, 0x1082, 0x0000
 };
 
+Boolean UsingGrayFont() 
+{
+    return currentFontPtr != NULL;
+}
+
 
 
 /* Set a map for colorizing a bitmap */
@@ -519,6 +524,9 @@
         uses8BitChars = ( charEncoding <= charEncodingPalmLatin );
     else
         uses8BitChars = true;
+#ifdef UNICODE_MODE
+    uses8BitChars = false;
+#endif
     err = FtrGet( sysFtrCreator, sysFtrNumWinVersion, &version );
     havePalmHiRes = ( HIGH_DENSITY_FEATURE_SET_VERSION <= version );
     resource.string[ RESOURCE_NAME_IDLETTER ] = RESOURCE_NAME_ID;
@@ -810,7 +818,7 @@
         inOffset = 0;
         while ( inOffset < length ) {
             WChar  ch;
-            inOffset += TxtGlueGetNextChar( chars, inOffset, &ch );
+            inOffset += MyTxtGlueGetNextChar( chars, inOffset, &ch );
             if ( length < inOffset )
                 break;
             width += GetGlyph( ch )->advance;
@@ -910,6 +918,7 @@
     WinDrawOperation  oldOperation = winPaint;
     Boolean           doKern;
 
+
     if ( currentFontPtr == NULL ) {
         if ( invert )
             WinDrawInvertedChars( chars, length, x, y );
@@ -945,7 +954,7 @@
     bitmapTopLeftX = 0;
     bitmapTopLeftY = 0;
 
-    TxtGlueGetNextChar( chars, 0, &ch );
+    MyTxtGlueGetNextChar( chars, 0, &ch );
     firstKern = GetGlyph( ch )->leftKerning;
 
     switch ( resource.string[ RESOURCE_NAME_ORIENTATION ] )
@@ -1039,7 +1048,7 @@
         GrayFontGlyphInfo*     glyph;
         UInt16                 resourceIndex;
 
-        inOffset += TxtGlueGetNextChar( chars, inOffset, &ch );
+        inOffset += MyTxtGlueGetNextChar( chars, inOffset, &ch );
         if ( length < inOffset )
             break;
         glyph = GetGlyph( ch );
@@ -1174,7 +1183,7 @@
         WinDrawChar( ch, x, y );
         return;
     }
-    length = TxtGlueSetNextChar( line, 0, ch );
+    length = MyTxtGlueSetNextChar( line, 0, ch );
     GrayWinDrawChars( line, length, x, y );
 }
 
diff -ur plucker-original/viewer/grayfont.h plucker-new/viewer/grayfont.h
--- plucker-original/viewer/grayfont.h  2004-02-10 03:10:44.000000000 +0100
+++ plucker-new/viewer/grayfont.h       2004-03-09 20:00:23.000000000 +0100
@@ -29,6 +29,7 @@
 #include "config.h"
 #include "viewer.h"
 #include "hires.h"
+#include "unicode.h"
 
 #define GRAY_FONT_LEFT   'L'
 #define GRAY_FONT_RIGHT  'R'
@@ -45,6 +46,8 @@
 /* Stop them and clear memory */
 void GrayFntStop( void ) GRAYFONT_SECTION;
 
+Boolean UsingGrayFont() GRAYFONT_SECTION;
+
 Err GrayFntDefineFont ( FontID font, void*  fontP ) GRAYFONT_SECTION;
 
 FontID GrayFntGetFont( void ) GRAYFONT_SECTION;
diff -ur plucker-original/viewer/Makefile.in plucker-new/viewer/Makefile.in
--- plucker-original/viewer/Makefile.in 2004-02-28 22:09:01.000000000 +0100
+++ plucker-new/viewer/Makefile.in      2004-03-08 17:09:20.000000000 +0100
@@ -93,7 +93,7 @@
                     detailsform.c searchform.c categoryform.c fontform.c \
                     bookmark.c session.c document.c image.c history.c \
                     search8.c search.c prefsdata.c anchor.c \
-                    paragraph.c uncompress.c keyboard.c keyboardform.c \
+                    paragraph.c unicode.c uncompress.c keyboard.c keyboardform.c \
                     list.c link.c renamedocform.c hardcopyform.c font.c \
                     table.c fullscreenform.c @OS_EXTRA_SRC@
 
diff -ur plucker-original/viewer/os.c plucker-new/viewer/os.c
--- plucker-original/viewer/os.c        2004-01-04 13:02:09.000000000 +0100
+++ plucker-new/viewer/os.c     2004-03-08 17:09:20.000000000 +0100
@@ -38,6 +38,7 @@
 #include "image.h"
 #include "axxpacimp.h"
 #include "skins.h"
+#include "unicode.h"
 
 #include "os.h"
 
@@ -161,7 +162,7 @@
     MemSet( s, MAX_CHARACTER_LENGTH, 0 );
     s[ 0 ] = word >> 8;
     s[ 1 ] = word & 0xFF;
-    return 1 < TxtGlueGetNextChar( s, 0, NULL );
+    return 1 < MyTxtGlueGetNextChar( s, 0, NULL );
 }
 
 
@@ -371,16 +372,22 @@
     if ( charEncoding != charEncodingPalmLatin )
         return 0;
 
+
     entries = sizeof(Latin1Mapping)/sizeof(CharMapping);
 
     for ( i = 0 ;  i < entries;  i++ ) {
         if ( Latin1Mapping [ i ].unicodeValue == 0 )
             return 0;
-        else if ( charValue < Latin1Mapping [ i ].unicodeValue )
+/*        else if ( charValue < Latin1Mapping [ i ].unicodeValue )
             return 0;
+*/
         else if ( Latin1Mapping [ i ].unicodeValue == charValue )
             return Latin1Mapping[ i ].palmCharValue;
     }
+
+    if (charValue <= 255)
+        return charValue;
+
     return 0;
 }
 
@@ -432,6 +439,7 @@
 #endif
     if ( IsDoubleByteSingleChar( testDoubleByteBig5GB2312EUCJPKR ) ||
          IsDoubleByteSingleChar( testDoubleByteShiftJIS ) ||
+         IsDoubleByteSingleChar( testDoubleByteUTF8 ) ||
          IsDoubleByteSingleChar( testDoubleByteJISKuten ) ) {
         uses8BitChars              = false;
     }
diff -ur plucker-original/viewer/paragraph.c plucker-new/viewer/paragraph.c
--- plucker-original/viewer/paragraph.c 2004-02-20 17:19:19.000000000 +0100
+++ plucker-new/viewer/paragraph.c      2004-03-16 21:10:07.000000000 +0100
@@ -340,7 +340,7 @@
 static Int16 littleSpace;   /* Extra pixels in each */
 
 /* A one-character pushback for character tokens */
-static Char  pushedChar     = 0;
+static WChar  pushedChar     = 0;
 
 /* Used to see if the current font is the fixed with font */
 static Boolean fixedWidthFont = false;
@@ -396,8 +396,8 @@
            tapped position */
         while ( offset < len ) {
             WChar ch;
-
-            offset   += TxtGlueGetNextChar( chars, offset, &ch );
+            UseLegacyEncoding(!UsingGrayFont());        
+            offset   += MyTxtGlueGetNextChar( chars, offset, &ch );
             charWidth = TxtGlueCharWidth( ch );
             if ( CharIsSpace( ch ) ) {
                 x += charWidth;
@@ -517,7 +517,10 @@
                                 selectedWordBounds[ i ].extent.y )
             bottomY = selectedWordBounds[ i ].topLeft.y +
                           selectedWordBounds[ i ].extent.y;
-        stringSize += TxtGlueSetNextChar( selectedWord, stringSize, ch );
+        /* we are using legacy encoding here since no lookup plugin can handle utf-8 
strings */
+        UseLegacyEncoding(true);
+        stringSize += MyTxtGlueSetNextChar( selectedWord, stringSize, ch );
+
     }
     selectedWord[ stringSize ] = '\0';
     if ( bounds != NULL ) {
@@ -1118,6 +1121,8 @@
     }
     else {
         if ( tContext->writeMode == WRITEMODE_COPY_CHAR || ! goodTable ) {
+            UseLegacyEncoding(!UsingGrayFont());
+
             DrawText( name, length, tContext );
             *width = FntCharsWidth( name, length );
         }
@@ -1555,7 +1560,7 @@
     UInt8*  functionArgs;
     UInt32  charValue;
     UInt8   charsToSkip;
-    UInt16  palmChar;
+    WChar   palmChar;
 #ifdef HAVE_IMODE
     DmOpenRef  plkrImodeDB;
 #endif
@@ -1595,9 +1600,12 @@
         }
     }
 #endif
+    if (UsingGrayFont())
+        palmChar = charValue;
+    else
+        palmChar = FindPalmCharForUnicodeChar( charValue );
 
-    palmChar = FindPalmCharForUnicodeChar( charValue );
-    if ( 0 < palmChar && PutNextToken( palmChar ) ) {
+    if ( PutNextToken( palmChar ) ) {
         pContext->position += charsToSkip;
     }
     return UNICODE;
@@ -1694,15 +1702,15 @@
     Int16   offset;
 
     if ( pushedChar != 0 ) {
-        *nextToken = ( UInt8 )pushedChar;
+        *nextToken = ( WChar )pushedChar;
         pushedChar = 0;
         return TOKEN_CHARACTER;
     }
 
     if ( pContext->last <= pContext->position )
         return TOKEN_PARAGRAPH_END;
-
-    pContext->position += TxtGlueGetNextChar( pContext->position, 0,
+    UseLegacyEncoding(1);
+    pContext->position += MyTxtGlueGetNextChar( pContext->position, 0,
                             &nextChar );
 
     if ( nextChar != '\0' ) {
@@ -1726,11 +1734,11 @@
 {
     if ( pushedChar != 0 )
         return false;
-
+/*
     if ( 256 <= nextToken )
         return false;
-
-    pushedChar = (Char) nextToken;
+*/
+    pushedChar = (WChar) nextToken;
 
     return true;
 }
@@ -2227,8 +2235,9 @@
         Char*     prevPosition;
 
         prevPosition  = pContext->position;
-        nextTokenType = GetNextToken( pContext, &nextChar );
-
+        
+        nextTokenType = GetNextToken( pContext, &nextChar ); 
+        
         if ( nextTokenType == TOKEN_PARAGRAPH_END ) {
             break;
         }
@@ -2248,6 +2257,7 @@
             }
             continue;
         }
+
         addMarginToCurrent = false;
 
         if ( skipLeadingSpace && CharIsSpace( nextChar ) && ! fixedWidthFont ) {
@@ -2287,7 +2297,10 @@
                 tContext->cursorX += FntCharsWidth( chars, len );
                 len = 0;
             }
-            len          += TxtGlueSetNextChar( chars, len, nextChar );
+            UseLegacyEncoding(!UsingGrayFont());
+            len          += MyTxtGlueSetNextChar( chars, len, nextChar );
+
         }
 
         if ( pContext->type == ALIGNMENT_JUSTIFY && nextChar == ' ' ) {
@@ -2318,6 +2331,8 @@
 
     if ( 0 < len ) {
         DrawText( chars, len, tContext );
         tContext->cursorX += FntCharsWidth( chars, len );
     }
     if ( invertPattern && tContext->writeMode == WRITEMODE_DRAW_CHAR )
@@ -2610,8 +2625,10 @@
         yPos += currentHeight / 2;
     else if ( GetCurrentStyle() == SUPSTYLE )
         yPos += currentHeight / 2 - GetPrevFontHeight();
-
+    UseLegacyEncoding(0);
     RotDrawChars(chars, len, tContext->cursorX, (Coord)yPos);
 }
 
 
@@ -2623,6 +2640,7 @@
     const TextContext* tContext
     )
 {
+    UseLegacyEncoding(0);
     RotDrawInvertedChars( chars, len, tContext->cursorX,
         tContext->cursorY - FntCharHeight() );
 }
diff -ur plucker-original/viewer/paragraph.h plucker-new/viewer/paragraph.h
--- plucker-original/viewer/paragraph.h 2004-02-01 12:26:33.000000000 +0100
+++ plucker-new/viewer/paragraph.h      2004-03-08 17:09:20.000000000 +0100
@@ -26,6 +26,7 @@
 #include "viewer.h"
 #include "document.h"
 #include "util.h"
+#include "unicode.h"
 
 /*
     A paragraph as it appears in the input data stream. The height of the
diff -ur plucker-original/viewer/rotate.c plucker-new/viewer/rotate.c
--- plucker-original/viewer/rotate.c    2004-01-04 01:21:36.000000000 +0100
+++ plucker-new/viewer/rotate.c 2004-03-08 17:09:20.000000000 +0100
@@ -472,7 +472,7 @@
     while ( 0 < length ) {
         Boolean missing;
 
-        charWidth  = TxtGlueGetNextChar( string, 0, &ch );
+        charWidth  = MyTxtGlueGetNextChar( string, 0, &ch );
         string    += charWidth;
         length    -= charWidth;
 
diff -ur plucker-original/viewer/rotate.h plucker-new/viewer/rotate.h
--- plucker-original/viewer/rotate.h    2003-08-11 04:31:57.000000000 +0200
+++ plucker-new/viewer/rotate.h 2004-03-08 17:09:20.000000000 +0100
@@ -30,6 +30,7 @@
 #include "jogdial.h"
 #endif
 #include "grayfont.h"
+#include "unicode.h"
 
 #ifdef HAVE_ROTATE

Unicode patch 3

Reply via email to