Re: some comments on the Plucker format spec and PalmOS viewer from a document-viewer expert

Bill Janssen Fri, 11 Oct 2002 19:50:07 -0700

> The main problem I had with translating the spec onto the Zaurus were that a
> number of "sizes" were in pixels. No problem for images but it was a bit of
> pain for things like indents when I zoomed in/out. The other main places
> that I recall that pixel sizes are used is in paragraph spacing (which I
> don't really use on the Zaurus yet) and in horizontal rule size.


Yes, this has bothered me too.  But I don't think it's going to change
soon.  One way to proceed is to think of one pixel as 1/50 of an inch
(guessing as to the DPI of the original Palm).

> For example, on
> the Z I use bold, italic etc flags to indicate the style of the font and an
> offset (which can be negative) into a size hierarchy to specify the size of
> the font (eg, one size bigger than base font). This would be too Z centric
> but something even more general would be better.

If you look at the GTK viewer code, you'll see something similar.  H6
is bold but the same size as regular text, H5 is 1.2 x regular, H4 is
1.4 times, etc. to H1, which is twice the point size of regular.

> A current problem I have is figuring out how to get the URLs of the external
> links out. There is a slight mismatch in terminology in the version of the
> document I have where the descriptive text refers to URLs but the spec for
> the paragraph types refers to links when referring to the record types - or
> maybe these are different and that is why I haven't yet managed to get it
> working 8^).

It's tricky -- it took me a few weeks to figure it out.  I'd be happy
to re-write the PluckerDB document with better wording if we can
figure it out.  Here's how it works: As URLs are encountered in the
text, and extra records are generated by the distiller (when you have
to break a page into multiple parts), they are assigned ascending
numbers, record-id numbers.  These numbers are assigned WHETHER OR NOT
an actual record with that ID is present in the document.  Some
record-ID numbers are phantom; that is, they are assigned, but that
record (not the page) is later found to be non-existent or
unnecessary.  Phantom records have no URL.  Some internal records
which are actually in the document, but are derived or metadata, also
have no URL.

When it's time to write the URL data record, we treat each
non-existent URL as a zero-length string, and each real URL, whether
or not the page it's for was included in the document, as a string.
We concatenate all the strings, using NUL characters between them.
Thus there may be a run of several NUL characters at the beginning of
the first URL data record, or anywhere in the record where a phantom
record-ID occurred.  When processing the URL data record, you
basically count NUL characters to identify the URL (which may be
zero-length) for a particular record-ID.

Because we only put a small number of URLs in each URL data record
(1-200), we have a second type of record, called the 'URL handling
data record', which is just an index into the set of URL data records.
So to find the URL for a particular record-id, you first look at the
URL handling data record, and figure out from that which URL data
record the URL is in.  You then look at that data record, scanning NUL
chars to find the end of URLS, till you come to the right one -- which
may be zero-length.

Hope this clears it up.

> the unicode characters (ironic since unicode handling is
> already builtin - but the all the other document decoders are pure 8-bit so
> the unicode stuff comes too late in the chain).

It was fairly easy to implement in the GTK viewer, since we have to
process function codes anyway.

Did you use the libunpluck library?  It's very vanilla plain C, but it
does implement the owner-id decoding.

> won't I lose the "Indent first line of every
> paragraph" effect from the distiller - or that accomplished in a different
> way? (eg, If the distiller set the indent before the first character of the
> paragraph and then immediately set it back after the first character it
> would work on my reader - but I want to match what the distiller currently
> does).

The distiller only indents paragraph beginnings when the configuration
parameter "indent_paragraphs" is True.  Otherwise it puts extra
spacing between them.  Or you could look for the indentation string
(which is "\x0a\x0a\x0a\x0a\x0a\x0a" at the beginning of a paragraph)
and do the indent in the viewer.

What we need are stylesheets.

Bill
_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: some comments on the Plucker format spec and PalmOS viewer from a document-viewer expert

Reply via email to