On Sat, Mar 13, 2004 at 08:15:43PM -0500, Alexander R. Pruss wrote:
> > you can select anything as --output-charset, characters from
> > --input-charset that cannot be represented in --output-charset
> > are included as unicode values - this is why default --output-charset
> > is ascii, rather than palmos.
>
> Having default output charset not be palmos will make searching
> significantly less efficient, especially on ARM units. Searching is
> optimized for 8-bit text. In fact, it currently doesn't work for unicode at
> all (unless your patch fixes that).
It does not, and this is a valid point. So it is back to palmos.
On Sun, Mar 14, 2004 at 12:10:19PM +0100, Michael Nordstrom wrote:
> On Thu, Mar 11, 2004, Radovan Garabik wrote:
>
> > New version of plucker unicode patch.
>
> It is great that you add support for this, but in what way is this
> related to the libraryform enhancements? Please don't use a message
> from a completely different thread as a "template". Usually, it is a
> common feature to create a new message in a mail client, so I doubt
> it can be that hard for you to create a new message with a correct
> *subject* ;-)
sorry, I messed up (like, going to lunch and being already late
and trying to finish up the mail *quickly* :-))
New version of unicode patch is attached. This is combined patch,
for both parser and viewer.
Changes:
- deafult output charset for parser is palmos
- FindPalmCharForUnicodeChar is back if you are not using
gray fonts - good for backward compatibility
- unicode patch is now compatible with word lookup (as far as it can
be), by using TxtGlue* where appropriate - so you can use word
lookup regardless of unicode mode or legacy encoding of the document,
as long as the characters fit into your palm charset (that is,
PalmOS cp1252 hybrid, but in theory it should work with multibyte
japanese devices as well, I just have no way of testing it).
If the character does _not_ fit into palm encoding, you get
whatever TxtGlue* gives - i.e. mess-up.
To test the patch:
- get unicode font from
http://kassiopeia.juls.savba.sk/~garabik/plucker/unicode_test.prc.gz
(warning: it is 402KB ungzipped). The font has glyphs up to U+11F9 and it
is a hires font.
- get test document from
http://kassiopeia.juls.savba.sk/~garabik/plucker/sklo.pdb
The text uses some combining characters (I could not make
font with glyphs going up to the range of extended greek chars, so I
broke it down into NFD normalization, but it nicely shows the usage of
combining characters as well). As long as source font actually
contains combining characters with negative kerning, they are going
to be displayed OK (not perfectly, but acceptably) if the preceeding
character has not abnormal width. In the example I am giving,
iota is obviously too thin and following perispomeni hangs over previous
epsilon. Also, psili and oxia do not look good at all when stacked both
over one letter.
--
-----------------------------------------------------------
| Radovan Garab�k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
diff -ur plucker-original/configure.in plucker-new/configure.in
--- plucker-original/configure.in 2004-03-07 13:00:33.000000000 +0100
+++ plucker-new/configure.in 2004-03-08 17:09:20.000000000 +0100
@@ -320,6 +320,7 @@
to get the function names included in POSE's
profiling output ])
AC_ARG_ENABLE(imode, [ --enable-imode to enable i-mode support (also
requires the imodeicons.pdb database)])
+AC_ARG_ENABLE(unicode, [ --enable-unicode to enable unicode support])
AC_ARG_ENABLE(scroll_to_bottom, [ --disable-scroll-to-bottom
always scroll even pages instead of stopping when
the end of the page is reached (will add some extra
diff -ur plucker-original/parser/python/PyPlucker/Spider.py
plucker-new/parser/python/PyPlucker/Spider.py
--- plucker-original/parser/python/PyPlucker/Spider.py 2004-02-02 03:31:58.000000000
+0100
+++ plucker-new/parser/python/PyPlucker/Spider.py 2004-03-10 19:52:53.000000000
+0100
@@ -1276,8 +1276,10 @@
message(0, " Set or clear the backup bit in the output
file.")
message(0, " --beamable, --not-beamable:")
message(0, " Set or clear the beamable bit in the output
file.")
- message(0, " --charset=<name>:")
- message(0, " Set the default charset to that specified by
<name>.")
+ message(0, " --output-charset=<name>:")
+ message(0, " Set the output charset of generated document
to that specified by <name>.")
+ message(0, " --input-charset=<name>:")
+ message(0, " Assume input charset to that specified by
<name>.")
message(0, " --owner-id=<name>:")
message(0, " Set owner-id of the output document to
<name>.")
message(0, " --url-pattern=<regexp-pattern>:")
@@ -1350,7 +1352,8 @@
backup = None
copy_protect = None
iconfile = None
- default_charset = None
+ output_charset = None
+ input_charset = None
owner_id = None
url_pattern = None
referrer = None
@@ -1376,7 +1379,7 @@
"maxheight=", "maxwidth=", "alt-maxheight=",
"alt-maxwidth=",
"compression=", "home-url=", "update-cache",
"launchable",
"not-launchable", "backup", "no-backup",
"beamable", "not-beamable",
- "icon=", "charset=", "owner-id=",
"url-pattern=", "referrer=",
+ "icon=", "output-charset=", "input-charset=",
"owner-id=", "url-pattern=", "referrer=",
"user-agent=", "title=", "author=",
"status-file=", "version",
"tables", "depth-first", "http-proxy=",
"http-proxy-user=", "http-proxy-pass=",
"fragments=", "creator-id="])
@@ -1494,8 +1497,10 @@
copy_protect = 1
elif opt == "--icon":
iconfile = arg
- elif opt == "--charset":
- default_charset = arg
+ elif opt == "--output-charset":
+ output_charset = arg
+ elif opt == "--input-charset":
+ input_charset = arg
elif opt == "--owner-id":
owner_id = arg
elif opt == "--referrer":
@@ -1602,21 +1607,19 @@
if zlib_compression == 'false':
message('Specification of an owner-id forces use of zlib compression...')
zlib_compression = 'true'
-
- mibenum = None
- # if not specified on command line, look in .pluckerrc
- if default_charset is None:
- default_charset = config.get_string("default_charset")
- # if we have one, validate it
- if default_charset is not None:
- from PyPlucker.helper.CharsetMapping import charset_name_to_mibenum,
charset_known_names
- import string, re
- mibenum = charset_name_to_mibenum(default_charset)
- if mibenum:
- config.set('default_charset', mibenum)
- else:
- usage ("Error: Unsupported charset '" + default_charset + "' specified
as default charset.\n"
- " Charset must be either a decimal MIBenum value, or one of
" + str(charset_known_names()))
+
+ if output_charset is None:
+ output_charset = config.get_string("output_charset")
+ if output_charset is None:
+ output_charset = 'palmos'
+ config.set ('output_charset', output_charset)
+
+ if input_charset is None:
+ input_charset = config.get_string("input_charset")
+ if output_charset is None:
+ input_charset = 'utf-8'
+ config.set ('input_charset', input_charset)
+
# update the config with the user options
if use_file is not None:
@@ -1696,8 +1699,6 @@
config.set ('author_md', author)
if title is not None:
config.set ('title_md', title)
- if mibenum is not None:
- config.set ('default_charset', mibenum)
if statusfile is not None:
config.set ('status_file', statusfile)
if depthfirst is not None:
diff -ur plucker-original/parser/python/PyPlucker/TextParser.py
plucker-new/parser/python/PyPlucker/TextParser.py
--- plucker-original/parser/python/PyPlucker/TextParser.py 2004-02-27
23:51:08.000000000 +0100
+++ plucker-new/parser/python/PyPlucker/TextParser.py 2004-03-11 19:46:50.000000000
+0100
@@ -31,6 +31,8 @@
## Now PyPlucker things should generally be importable
##
+NBSP = u'\u00a0' # non-breaking space
+
import string
import re
try:
@@ -336,6 +338,33 @@
_entitycharref = re.compile('^(.*)&([#a-zA-Z][-.a-zA-Z0-9]*);(.*)$')
_html_char_ref_pattern = re.compile('^&#([0-9]+);$')
+# this needs to be rewritten
+def text_alternative (uchar):
+ "get text alternative to unicode character uchar"
+ val = ord(uchar)
+ if val == 8211:
+ return "-"
+ elif val == 8212:
+ return "--"
+ elif val == 8216:
+ return "`"
+ elif val == 8217:
+ return "'"
+ elif val == 8220:
+ return "\""
+ elif val == 8230:
+ return "..."
+ elif val == 8221:
+ return "\""
+ elif val == 8226:
+ return "o"
+ elif val == 8482:
+ return "(tm)"
+ else:
+ return "?"
+ return "&#%d;" % val
+
+
# These junk "alt" attribute values are not worth showing.
junk_alt_attributes = ("img", "[img]", "spacer", "")
@@ -374,8 +403,6 @@
return text
-
-
class AttributeStack:
"""A data structure to maintain information about the current
text attributes.
@@ -525,12 +552,11 @@
return self._tags[self._stack[-1]]
-
-
class TextDocBuilder:
"""Encapsulate the knowledge of when to change styles, add paragraphs, etc."""
def __init__ (self, url, config, **keyword_args):
+ message(2,"initializing textdocbuilder")
self._doc = PluckerDocs.PluckerTextDocument (url)
self._config = config
self._attributes = AttributeStack ()
@@ -582,19 +608,12 @@
# see if we can supply a default charset
url = self._doc.get_url()
if self._config:
- userspec = self._config.get_int('default_charset', 0)
+ userspec = self._config.get_int('output_charset_mibenum', 0)
else:
userspec = None
locale_default = charset_name_to_mibenum(DEFAULT_LOCALE_CHARSET_ENCODING)
- # the userspec will take precedence
- if userspec:
+ if userspec is not None:
self._doc.set_charset(userspec)
- # OK, so we have no idea. Use the HTTP default of ISO-8859-1 (4) for
- # http: URLs, and the environment default (if any) for others
- elif (string.lower(url[:5]) == 'http:' or string.lower(url[:6]) ==
'https:'):
- self._doc.set_charset(4)
- elif locale_default:
- self._doc.set_charset(locale_default)
def add_name (self, name):
"""Give name to the current paragraph"""
@@ -875,7 +894,28 @@
def add_text (self, text):
- """Add some text, maybe even many lines."""
+ """Add some text, maybe even many lines.
+ Text can be either a string or a unicode string.
+ """
+
+ def add_unicode_text(paragraph, text):
+ if type(text)==type(""): # non-unicode string, shortcut
+ message(4, "Adding 8-bit text")
+ paragraph.add_text(text)
+ elif type(text)==type(u""):
+ message(4, "Adding Unicode text")
+ for c in text:
+ if ord(c)<128:
+ paragraph.add_text(str(c))
+ else:
+ try:
+ outc = c.encode(self._config.get_string("output_charset"))
+ paragraph.add_text(outc)
+ except UnicodeError:
+ paragraph.add_unicode_char(ord(c), text_alternative(c))
+ else:
+ raise "Unexpected text type"
+
lines = string.split (text, "\n")
for i in range (len (lines)):
line = lines[i]
@@ -891,7 +931,7 @@
if rest_size < 0:
rest_size = 0
(first, rest) = self._find_text_split (line, rest_size)
- self._paragraph.add_text (first)
+ add_unicode_text(self._paragraph, first)
self._approximate_size = self._approximate_size + len (first)
self._is_new_paragraph = 0
self._is_new_line = 0
@@ -901,7 +941,7 @@
break
if line:
- self._paragraph.add_text (line)
+ add_unicode_text(self._paragraph, line)
self._approximate_size = self._approximate_size + len (line)
self._is_new_paragraph = 0
self._is_new_line = 0
@@ -963,12 +1003,17 @@
def __init__ (self, url, text, headers, config, attribs):
text = _clean_newlines (text)
+ textcharset = config.get_string("input_charset")
# This we use to build the document
self._doc = TextDocBuilder (url, config)
if headers.has_key("charset"):
- self._doc.set_charset (headers["charset"])
+ textcharset = headers["charset"]
elif attribs.has_key("charset"):
- self._doc.set_charset (attribs["charset"])
+ textcharset = attribs["charset"]
+ if not textcharset: # we have no idea, so we use locale
+ textcharset = DEFAULT_LOCALE_CHARSET_ENCODING
+ text = unicode(text, textcharset)
+ message(4, "PlainTextParser: converting into unicode from "+textcharset)
self._url = url
self._text = text
# In these two lists we store tuples of (url, attributes) for encountered
anchors
@@ -1060,9 +1105,11 @@
# javascript:document.write("<div>") turns it back on, because
# it only recognizes the div, not the javascript.
self._visible = 1
- self._charset = headers.has_key('charset') and
charset_name_to_mibenum(headers['charset'])
- if self._charset:
- self._doc.set_charset(headers['charset'])
+ # charset (python name of it) of current document - first: default
+ self.html_charset = config.get_string("input_charset")
+ # second: from headers
+ if headers.has_key('charset'):
+ self.html_charset = headers['charset']
# Since some users are really stupid and use HTML wrong, we need a
# stack of these values
self._visibility_stack = []
@@ -1153,8 +1200,8 @@
# we can only check the charset specified in the attribs after parsing
# the document for <META> tags. Seems kind of backward, but that's the
# HTML spec.
- if not self._charset and self._attribs.has_key('charset'):
- self._set_charset(self._attribs['charset'])
+ #if not self._charset and self._attribs.has_key('charset'):
+ # self._set_charset(self._attribs['charset'])
self._doc.close ()
def get_plucker_doc (self):
@@ -1300,7 +1347,8 @@
_add_vspace() to do that explicitly if you want to."""
if self._visible:
if self.atable is not None and self.in_cell:
- self.atable.add_cell_text (text)
+ if type(text)==type(""):
+ self.atable.add_cell_text (text)
else:
self._doc.add_text (text)
self._element_beginning = 0
@@ -1392,9 +1440,8 @@
self._visible = 1
def _set_charset (self, charset):
- if charset_name_to_mibenum(charset):
- self._charset = charset
- self._doc.set_charset(charset)
+ message(4, "Setting html charset to "+charset)
+ self.html_charset = charset
################################################################################
######## HTML specifics
@@ -1430,9 +1477,10 @@
def do_meta (self, data):
- # if the charset is not already assigned (from the HTTP headers, presumably)
- # and it's available here, then use it
- if not self._charset and string.lower(data[0][0]) == 'http-equiv' and
string.lower(data[0][1]) == 'content-type':
+ # if the charset is specified here, use it
+ # this is against html specs (headers have precedence), but
+ # conforms to common usage and is easier to program :-)
+ if string.lower(data[0][0]) == 'http-equiv' and string.lower(data[0][1]) ==
'content-type':
from PyPlucker.Retriever import parse_http_header_value
ctype, parameters = parse_http_header_value(data[1][1])
for parameter in parameters:
@@ -1446,10 +1494,7 @@
except ValueError:
self.unknown_entityref(name)
return
- if not 0 <= n <= 255:
- self.unknown_charref(name)
- return
- self.handle_data(chr(n))
+ self.handle_data(unichr(n))
def handle_special (self, name):
@@ -1478,7 +1523,8 @@
data = string.translate (data, _CLEANUP_TRANSTABLE)
data = string.replace (data, "\t", " ")
-
+ if type(data)==type(""):
+ data = unicode(data, self.html_charset or 'iso8859_1')
#stripped_data = string.strip(data)
if data:
# not just blank or empty text (e.g. from comments), so we
@@ -1522,8 +1568,8 @@
style_str = struct.pack (">BB", 0, 0x78)
self.atable.add_cell_text(style_str)
self.last_table_strike = new_strike
-
- self._add_text (data)
+ self._add_text(data)
+ message(4, "handling data "+`data`)
def start_body (self, attributes):
@@ -1886,7 +1932,8 @@
def do_p (self, attributes):
if self._needs_newpara ():
if self._indent_paragraphs:
- self._add_text('\xa0\xa0\xa0\xa0\xa0\xa0')
+ #self._add_text('\xa0\xa0\xa0\xa0\xa0\xa0')
+ self._add_text(6*NBSP)
else:
self._add_vspace (2)
@@ -2049,7 +2096,7 @@
text = ((0x2022, "o"), " ")
indent = 7
elif self._ul_list_depth == 2:
- text = chr(0xbb) + " "
+ text = unichr(0xbb) + " "
indent = 6
elif self._ul_list_depth == 3:
text = "+ "
@@ -2063,15 +2110,15 @@
self._doc.set_style ("") # make sure we render the 'bullet' marker in normal
style
if self.atable is not None and self.in_cell:
- self._add_text('\xa0\xa0' * table_margin)
+ self._add_text((2*NBSP) * table_margin)
style_str = struct.pack (">BBBBB", 0, 0x53, 0, 0, 0) # black
self.atable.add_cell_text(style_str)
- if type(text) == type(""):
+ if type(text) == type("") or type(text) == type(u""):
self._add_text (text)
elif type(text) == type(()):
for element in text:
- if type(element) == type(""):
+ if type(element) == type("") or type(element) == type(u""):
self._add_text(element)
elif type(element) == type(()) and len(element) == 2:
self._add_unicode_char(element[0], element[1])
@@ -2367,31 +2414,6 @@
if not self._unhandled_tags.has_key (tag):
self._unknown["</%s>"%tag] = 1
- def unknown_charref (self, ref):
- if self._visible:
- val = int(ref)
- if val == 8211:
- self._add_unicode_char (val, "-")
- elif val == 8212:
- self._add_unicode_char (val, "--")
- elif val == 8216:
- self._add_unicode_char (val, "`")
- elif val == 8217:
- self._add_unicode_char (val, "´")
- elif val == 8220:
- self._add_unicode_char (val, "\"")
- elif val == 8230:
- self._add_unicode_char (val, "...")
- elif val == 8221:
- self._add_unicode_char (val, "\"")
- elif val == 8226:
- # what's this? Unbreakable space?
- self._add_unicode_char (val, " ")
- elif val == 8482:
- self._add_unicode_char (val, "(tm)")
- else:
- self._unknown["charref-%s" % ref] = 1
- self._add_unicode_char (val, "&#%d;" % val)
def unknown_entityref (self, ref):
if self._visible:
@@ -2399,14 +2421,11 @@
s = htmlentitydefs.entitydefs[ref]
if len(s) == 1:
val = ord(s)
- if (val >= 0xa0 and val < 0x100) or (val >= 0x00 and val < 0xFF):
- self.handle_data (s)
- else:
- self._add_unicode_char(val, "&#%d;" % val)
+ self.handle_data(unichr(val))
else:
m = _html_char_ref_pattern.match(s)
if m:
- self.unknown_charref(m.group(1))
+ self.handle_data(unichr(int(m.group(1))))
else:
self._unknown["entityref-%s"%ref] = 1
self.handle_data('?')
diff -ur plucker-original/viewer/config.h.in plucker-new/viewer/config.h.in
--- plucker-original/viewer/config.h.in 2004-03-10 18:01:26.000000000 +0100
+++ plucker-new/viewer/config.h.in 2004-03-10 18:51:34.000000000 +0100
@@ -116,3 +116,6 @@
/* Define if supporting word lookup */
#undef SUPPORT_WORD_LOOKUP
+
+/* Define if using unicode mode support */
+#undef UNICODE_MODE
diff -ur plucker-original/viewer/configure.in plucker-new/viewer/configure.in
--- plucker-original/viewer/configure.in 2004-02-28 16:28:21.000000000 +0100
+++ plucker-new/viewer/configure.in 2004-03-08 17:09:20.000000000 +0100
@@ -31,6 +31,7 @@
DEFAULT_SKINS=no
DEFAULT_ARMLET=no
DEFAULT_IMODE=no
+DEFAULT_UNICODE=no
DEFAULT_CATEGORY=""
DEFAULT_WAIT_ICON=bubble
DEFAULT_LANG="en de cs it fr ja fo da zh_CN pl ru es tr th ca no"
@@ -418,6 +419,17 @@
AC_DEFINE(HAVE_IMODE,, [ Define if using i-mode support])
fi
+AC_MSG_CHECKING(--enable-unicode argument)
+AC_ARG_ENABLE(unicode, [ --enable-unicode to enable unicode grayfont
support],
+ UNICODE=yes, UNICODE=$DEFAULT_UNICODE)
+AC_MSG_RESULT($UNICODE)
+
+if test "$UNICODE" != "no"; then
+ AC_DEFINE(UNICODE_MODE,, [ Define if using unicode mode support])
+fi
+
+
+
AC_ARG_DISABLE(scroll_to_bottom, [ --disable-scroll-to-bottom
always scroll even pages instead of stopping when
the end of the page is reached (will add some extra
@@ -784,6 +796,11 @@
else
echo " I-mode Support: disabled"
fi
+if test "$UNICODE" != "no" ; then
+ echo " Unicode Support: enabled"
+else
+ echo " Unicode Support: disabled"
+fi
if test "$AXXPAC" != "no" ; then
echo " AxxPac Support: enabled"
else
diff -ur plucker-original/viewer/const.h plucker-new/viewer/const.h
--- plucker-original/viewer/const.h 2004-02-28 16:28:21.000000000 +0100
+++ plucker-new/viewer/const.h 2004-03-08 17:09:20.000000000 +0100
@@ -101,3 +101,5 @@
/* 3B 22 is a single character in JIS and Kuten */
#define testDoubleByteJISKuten 0x3B22
+/* 04 00 is a single character in UTF-8 */
+#define testDoubleByteUTF8 0x0400
diff -ur plucker-original/viewer/grayfont.c plucker-new/viewer/grayfont.c
--- plucker-original/viewer/grayfont.c 2004-03-05 15:48:27.000000000 +0100
+++ plucker-new/viewer/grayfont.c 2004-03-09 20:01:28.000000000 +0100
@@ -26,6 +26,7 @@
#include "prefsdata.h"
#include "palmbitmap.h"
#include "font.h"
+#include "debug.h"
#define NO_GRAY_FONT_SUBSTITUTION
#include "grayfont.h"
@@ -141,7 +142,6 @@
-
/***********************************************************************
*
* Private variables
@@ -167,6 +167,11 @@
0x632c, 0x52aa, 0x4228, 0x3186, 0x2104, 0x1082, 0x0000
};
+Boolean UsingGrayFont()
+{
+ return currentFontPtr != NULL;
+}
+
/* Set a map for colorizing a bitmap */
@@ -519,6 +524,9 @@
uses8BitChars = ( charEncoding <= charEncodingPalmLatin );
else
uses8BitChars = true;
+#ifdef UNICODE_MODE
+ uses8BitChars = false;
+#endif
err = FtrGet( sysFtrCreator, sysFtrNumWinVersion, &version );
havePalmHiRes = ( HIGH_DENSITY_FEATURE_SET_VERSION <= version );
resource.string[ RESOURCE_NAME_IDLETTER ] = RESOURCE_NAME_ID;
@@ -810,7 +818,7 @@
inOffset = 0;
while ( inOffset < length ) {
WChar ch;
- inOffset += TxtGlueGetNextChar( chars, inOffset, &ch );
+ inOffset += MyTxtGlueGetNextChar( chars, inOffset, &ch );
if ( length < inOffset )
break;
width += GetGlyph( ch )->advance;
@@ -910,6 +918,7 @@
WinDrawOperation oldOperation = winPaint;
Boolean doKern;
+
if ( currentFontPtr == NULL ) {
if ( invert )
WinDrawInvertedChars( chars, length, x, y );
@@ -945,7 +954,7 @@
bitmapTopLeftX = 0;
bitmapTopLeftY = 0;
- TxtGlueGetNextChar( chars, 0, &ch );
+ MyTxtGlueGetNextChar( chars, 0, &ch );
firstKern = GetGlyph( ch )->leftKerning;
switch ( resource.string[ RESOURCE_NAME_ORIENTATION ] )
@@ -1039,7 +1048,7 @@
GrayFontGlyphInfo* glyph;
UInt16 resourceIndex;
- inOffset += TxtGlueGetNextChar( chars, inOffset, &ch );
+ inOffset += MyTxtGlueGetNextChar( chars, inOffset, &ch );
if ( length < inOffset )
break;
glyph = GetGlyph( ch );
@@ -1174,7 +1183,7 @@
WinDrawChar( ch, x, y );
return;
}
- length = TxtGlueSetNextChar( line, 0, ch );
+ length = MyTxtGlueSetNextChar( line, 0, ch );
GrayWinDrawChars( line, length, x, y );
}
diff -ur plucker-original/viewer/grayfont.h plucker-new/viewer/grayfont.h
--- plucker-original/viewer/grayfont.h 2004-02-10 03:10:44.000000000 +0100
+++ plucker-new/viewer/grayfont.h 2004-03-09 20:00:23.000000000 +0100
@@ -29,6 +29,7 @@
#include "config.h"
#include "viewer.h"
#include "hires.h"
+#include "unicode.h"
#define GRAY_FONT_LEFT 'L'
#define GRAY_FONT_RIGHT 'R'
@@ -45,6 +46,8 @@
/* Stop them and clear memory */
void GrayFntStop( void ) GRAYFONT_SECTION;
+Boolean UsingGrayFont() GRAYFONT_SECTION;
+
Err GrayFntDefineFont ( FontID font, void* fontP ) GRAYFONT_SECTION;
FontID GrayFntGetFont( void ) GRAYFONT_SECTION;
diff -ur plucker-original/viewer/Makefile.in plucker-new/viewer/Makefile.in
--- plucker-original/viewer/Makefile.in 2004-02-28 22:09:01.000000000 +0100
+++ plucker-new/viewer/Makefile.in 2004-03-08 17:09:20.000000000 +0100
@@ -93,7 +93,7 @@
detailsform.c searchform.c categoryform.c fontform.c \
bookmark.c session.c document.c image.c history.c \
search8.c search.c prefsdata.c anchor.c \
- paragraph.c uncompress.c keyboard.c keyboardform.c \
+ paragraph.c unicode.c uncompress.c keyboard.c keyboardform.c \
list.c link.c renamedocform.c hardcopyform.c font.c \
table.c fullscreenform.c @OS_EXTRA_SRC@
diff -ur plucker-original/viewer/os.c plucker-new/viewer/os.c
--- plucker-original/viewer/os.c 2004-01-04 13:02:09.000000000 +0100
+++ plucker-new/viewer/os.c 2004-03-08 17:09:20.000000000 +0100
@@ -38,6 +38,7 @@
#include "image.h"
#include "axxpacimp.h"
#include "skins.h"
+#include "unicode.h"
#include "os.h"
@@ -161,7 +162,7 @@
MemSet( s, MAX_CHARACTER_LENGTH, 0 );
s[ 0 ] = word >> 8;
s[ 1 ] = word & 0xFF;
- return 1 < TxtGlueGetNextChar( s, 0, NULL );
+ return 1 < MyTxtGlueGetNextChar( s, 0, NULL );
}
@@ -371,16 +372,22 @@
if ( charEncoding != charEncodingPalmLatin )
return 0;
+
entries = sizeof(Latin1Mapping)/sizeof(CharMapping);
for ( i = 0 ; i < entries; i++ ) {
if ( Latin1Mapping [ i ].unicodeValue == 0 )
return 0;
- else if ( charValue < Latin1Mapping [ i ].unicodeValue )
+/* else if ( charValue < Latin1Mapping [ i ].unicodeValue )
return 0;
+*/
else if ( Latin1Mapping [ i ].unicodeValue == charValue )
return Latin1Mapping[ i ].palmCharValue;
}
+
+ if (charValue <= 255)
+ return charValue;
+
return 0;
}
@@ -432,6 +439,7 @@
#endif
if ( IsDoubleByteSingleChar( testDoubleByteBig5GB2312EUCJPKR ) ||
IsDoubleByteSingleChar( testDoubleByteShiftJIS ) ||
+ IsDoubleByteSingleChar( testDoubleByteUTF8 ) ||
IsDoubleByteSingleChar( testDoubleByteJISKuten ) ) {
uses8BitChars = false;
}
diff -ur plucker-original/viewer/paragraph.c plucker-new/viewer/paragraph.c
--- plucker-original/viewer/paragraph.c 2004-02-20 17:19:19.000000000 +0100
+++ plucker-new/viewer/paragraph.c 2004-03-16 21:10:07.000000000 +0100
@@ -340,7 +340,7 @@
static Int16 littleSpace; /* Extra pixels in each */
/* A one-character pushback for character tokens */
-static Char pushedChar = 0;
+static WChar pushedChar = 0;
/* Used to see if the current font is the fixed with font */
static Boolean fixedWidthFont = false;
@@ -396,8 +396,8 @@
tapped position */
while ( offset < len ) {
WChar ch;
-
- offset += TxtGlueGetNextChar( chars, offset, &ch );
+ UseLegacyEncoding(!UsingGrayFont());
+ offset += MyTxtGlueGetNextChar( chars, offset, &ch );
charWidth = TxtGlueCharWidth( ch );
if ( CharIsSpace( ch ) ) {
x += charWidth;
@@ -517,7 +517,10 @@
selectedWordBounds[ i ].extent.y )
bottomY = selectedWordBounds[ i ].topLeft.y +
selectedWordBounds[ i ].extent.y;
- stringSize += TxtGlueSetNextChar( selectedWord, stringSize, ch );
+ /* we are using legacy encoding here since no lookup plugin can handle utf-8
strings */
+ UseLegacyEncoding(true);
+ stringSize += MyTxtGlueSetNextChar( selectedWord, stringSize, ch );
+
}
selectedWord[ stringSize ] = '\0';
if ( bounds != NULL ) {
@@ -1118,6 +1121,8 @@
}
else {
if ( tContext->writeMode == WRITEMODE_COPY_CHAR || ! goodTable ) {
+ UseLegacyEncoding(!UsingGrayFont());
+
DrawText( name, length, tContext );
*width = FntCharsWidth( name, length );
}
@@ -1555,7 +1560,7 @@
UInt8* functionArgs;
UInt32 charValue;
UInt8 charsToSkip;
- UInt16 palmChar;
+ WChar palmChar;
#ifdef HAVE_IMODE
DmOpenRef plkrImodeDB;
#endif
@@ -1595,9 +1600,12 @@
}
}
#endif
+ if (UsingGrayFont())
+ palmChar = charValue;
+ else
+ palmChar = FindPalmCharForUnicodeChar( charValue );
- palmChar = FindPalmCharForUnicodeChar( charValue );
- if ( 0 < palmChar && PutNextToken( palmChar ) ) {
+ if ( PutNextToken( palmChar ) ) {
pContext->position += charsToSkip;
}
return UNICODE;
@@ -1694,15 +1702,15 @@
Int16 offset;
if ( pushedChar != 0 ) {
- *nextToken = ( UInt8 )pushedChar;
+ *nextToken = ( WChar )pushedChar;
pushedChar = 0;
return TOKEN_CHARACTER;
}
if ( pContext->last <= pContext->position )
return TOKEN_PARAGRAPH_END;
-
- pContext->position += TxtGlueGetNextChar( pContext->position, 0,
+ UseLegacyEncoding(1);
+ pContext->position += MyTxtGlueGetNextChar( pContext->position, 0,
&nextChar );
if ( nextChar != '\0' ) {
@@ -1726,11 +1734,11 @@
{
if ( pushedChar != 0 )
return false;
-
+/*
if ( 256 <= nextToken )
return false;
-
- pushedChar = (Char) nextToken;
+*/
+ pushedChar = (WChar) nextToken;
return true;
}
@@ -2227,8 +2235,9 @@
Char* prevPosition;
prevPosition = pContext->position;
- nextTokenType = GetNextToken( pContext, &nextChar );
-
+
+ nextTokenType = GetNextToken( pContext, &nextChar );
+
if ( nextTokenType == TOKEN_PARAGRAPH_END ) {
break;
}
@@ -2248,6 +2257,7 @@
}
continue;
}
+
addMarginToCurrent = false;
if ( skipLeadingSpace && CharIsSpace( nextChar ) && ! fixedWidthFont ) {
@@ -2287,7 +2297,10 @@
tContext->cursorX += FntCharsWidth( chars, len );
len = 0;
}
- len += TxtGlueSetNextChar( chars, len, nextChar );
+ UseLegacyEncoding(!UsingGrayFont());
+ len += MyTxtGlueSetNextChar( chars, len, nextChar );
+
}
if ( pContext->type == ALIGNMENT_JUSTIFY && nextChar == ' ' ) {
@@ -2318,6 +2331,8 @@
if ( 0 < len ) {
DrawText( chars, len, tContext );
tContext->cursorX += FntCharsWidth( chars, len );
}
if ( invertPattern && tContext->writeMode == WRITEMODE_DRAW_CHAR )
@@ -2610,8 +2625,10 @@
yPos += currentHeight / 2;
else if ( GetCurrentStyle() == SUPSTYLE )
yPos += currentHeight / 2 - GetPrevFontHeight();
-
+ UseLegacyEncoding(0);
RotDrawChars(chars, len, tContext->cursorX, (Coord)yPos);
}
@@ -2623,6 +2640,7 @@
const TextContext* tContext
)
{
+ UseLegacyEncoding(0);
RotDrawInvertedChars( chars, len, tContext->cursorX,
tContext->cursorY - FntCharHeight() );
}
diff -ur plucker-original/viewer/paragraph.h plucker-new/viewer/paragraph.h
--- plucker-original/viewer/paragraph.h 2004-02-01 12:26:33.000000000 +0100
+++ plucker-new/viewer/paragraph.h 2004-03-08 17:09:20.000000000 +0100
@@ -26,6 +26,7 @@
#include "viewer.h"
#include "document.h"
#include "util.h"
+#include "unicode.h"
/*
A paragraph as it appears in the input data stream. The height of the
diff -ur plucker-original/viewer/rotate.c plucker-new/viewer/rotate.c
--- plucker-original/viewer/rotate.c 2004-01-04 01:21:36.000000000 +0100
+++ plucker-new/viewer/rotate.c 2004-03-08 17:09:20.000000000 +0100
@@ -472,7 +472,7 @@
while ( 0 < length ) {
Boolean missing;
- charWidth = TxtGlueGetNextChar( string, 0, &ch );
+ charWidth = MyTxtGlueGetNextChar( string, 0, &ch );
string += charWidth;
length -= charWidth;
diff -ur plucker-original/viewer/rotate.h plucker-new/viewer/rotate.h
--- plucker-original/viewer/rotate.h 2003-08-11 04:31:57.000000000 +0200
+++ plucker-new/viewer/rotate.h 2004-03-08 17:09:20.000000000 +0100
@@ -30,6 +30,7 @@
#include "jogdial.h"
#endif
#include "grayfont.h"
+#include "unicode.h"
#ifdef HAVE_ROTATE