Re: FOP at ApacheCon Europe 2005?

2005-03-07 Thread J.Pietschmann
Jeremias Maerki wrote:
I was also thinking about something like
hidden treasures in the XML Graphics project but I guess there's not
so much meat on that bone to fill one hour.
Well, there should be enough for an hour, at least in theory.
I couldn't convice (yet) my boss that I have an important mission
in Stuttgart in July. If I could, I'd probably talk about:
- Handling fonts in Java, why the AWT font and text rendering
 subsystem is lame, and what FOP, Batik and perhaps others would
 expect from an API.
- How to implement flowing text, line breaking and hyphenation
 efficiently; why the Java BreakIterator and other parts of the Java
 Unicode support sux0rs; what's behind TR14; Unicode normalization of
 text before looking it up in a dictionary, and efficient implementation
 of said dictionary for looking up all substrings in a word (using a
 trie, a PATRICIA tree or whatever)
- Talk about the question why the algorithms aren't simply copied
 from Gecko (the Mozilla layout engine)
Now that the deadline has been extended, I'll attempt it again.

Re: FOP 0.15 Double Byte Support

2005-02-11 Thread J.Pietschmann
m r dantuluri wrote:
I am using FOP 0.15 Version. The PDF files rendered by FOP gives junk 
charecters for double-byte languages like korea, japan etc.
FOP 0.15 doesn't support non-latin scripts properly. Your
only chance is to upgrade to FOP 0.19 or later, preferably
the latest release, 0.20.5. This also means that you have
to rewrite parts of your FO generator because of incompatible
changes in the spec (drafts) implemented by FOP.

Re: Dual Column Layout

2005-02-02 Thread J.Pietschmann
Puppala, Kumar (LNG-DAY) wrote:
I am having difficulty understanding how the dual column layout is
implemented in FOP.
Scenario 1:
I set the property column-count=2 on my fo:region-body object. As such the
text appears in dual column format. If I have the page totally filled out,
then everything seems to be fine. But if my document does not contain enough
text (usually the last page in my document), the text does not seem to be
evenly distributed in both the columns. It first tries to fill the left
column and then tries to fill the right column if there is an additional
As mandated by the spec.
If this page is the last page and we don't have any additional text,
the renderer should try to distribute it evenly on both columns. That is the
behavior I have seem on other viewers.
This can be forced by adding an empty block with span=all at the
end of the flow.

Scenario 2:
If I keep switching between dual and single columns on the same page ( using
the span=all property on an fo:block within a page), the distribution
between the columns seems to be happening but does not happen accurately. I
see more text on the left hand column than on the right hand column. Doing
this would leave additional blank space on the right hand column before we
switch to single column layout.
This is due to a simple algorithm for balancing. Getting column
balancing even somewhat right is quite complicated.

Re: cvs commit: xml-fop build.xml

2005-01-13 Thread J.Pietschmann
Jeremias Maerki wrote:
I see. I've added JAXP and Xerces to the classpath.
Isn't it somewhat strange that org.apache.xerces.parsers.SAXParser
is explicitely referenced? I'd think everyone uses JAXP meanwhile.
Do you access Xerces specific functionality?

Re: Horizontal Line

2004-12-27 Thread J.Pietschmann
Puppala, Kumar (LNG-DAY) wrote:
Hello All,
 Does anyone know what FO tags I need to use to generate a horizontal line
given the width, color and justification for this line?
Try fo:leader with some appropriate attributes.

Re: [OT] Printing the XSL WD

2004-12-24 Thread J.Pietschmann
Jeremias Maerki wrote:
How do you guys print out the new XSL WD? I don't manage to print this
document either in IE or in Firefox without having some of the content
being cropped.
My printer came with a WebPrint utility which plugs into
IEx and is advertised as Get your IEx printouts without any
content clipped! Duh!
I vaguely remember a Mozilla/Firefox plugin which scales web
content to better fit printed pages too.

Re: Retrieve-marker and removal of leading and trailing spaces

2004-12-08 Thread J.Pietschmann
Simon Pepping wrote:
The spaces before `and' and after `blue:' are removed. This is probably
due to the fact that the space removal mechanism does not recognize
that fo:retrieve-marker elements may generate text.
Whitespace/linefeed handling should run after rebinding the
retrieved marker content in order to get it right. I personally
still think it should be integrated into break position
computation, with something like a whitespace state held in
the layout context.

Re: Good news: Jeremias has been elected as an ASF member!

2004-12-02 Thread J.Pietschmann
Bertrand Delacretaz wrote:
I have the great pleasure to announce that Jeremias Maerki has been 
elected as an ASF member at the last member's meeting during ApacheCon.

Re: Knuth linebreaking questions

2004-12-01 Thread J.Pietschmann
Finn Bock wrote:
3) What is the reasoning for doing hyphenation only after threshold=1 
fails. Naive common sense tells me that if the user specify hyphenation 
we should do hyphenation before finding line breaks.
The purpose of professional typography and layout is to
assist the reader: provide an easy reading with minimal
distractions. Typographic concepts reflect this. Justified
text makes it easier to identify paragraphs. Unfortunately,
long words may cause word spaces to be stretched into large
white blobs which disrupt reading. Hyphenation is essential
to cut down on space allocated for text justification,
especially for languages which can form arbitrary long
compound words. Hyphenation has of course it's own drawback:
words are mostly identified by the letters at the beginning
and the end, and hyphenation disrupts this. Several lines
ending in hyphenated words may also cause the reader to pick
up the wrong continuation line (that's the reason for having
the hyphenation-ladder-count property). This tradeoff between
using hyphenation in order to avoid visual artefacts and
having lots of hyphenated words disrupting the flow has to be

Re: cvs commit: xml-fop/src/java/org/apache/fop/traits

2004-11-25 Thread J.Pietschmann
gmazza  2004/11/24 13:07:31
  2.) Appended EN_ to enumeration constants to make them better SR'able 
throughout app.
Yuk. Having a large number of identifiers in the same scope with
an identical prefix isn't very good for autocompletion both in
Emacs and Eclipse. I also don't quite get the point about the
better SR'ability.

Re: HEAD compile problem with JDK 1.3

2004-11-18 Thread J.Pietschmann
Glen Mazza wrote:
No--I think it is that Leader, by being a subclass of
FObj, implements the Constants interface, which has a
(class?) called LeaderPattern.
An inner interface.
 (I'm unsure why it
doesn't work in 1.3--that is strange.)
Me too. Java 1.3 didn't recognize the identifier LeaderPattern,
perhaps there was a change between 1.3 and 1.4 regarding
access to nested interfaces by implementing classes.
Anyway, what's the point of having the constants both in
the Constants interface and nested interfaces? I'm confused.
Well, now that the FOP website is generated by Forrest,
the various html-* targets could be removed, couldn't they?

HEAD compile problem with JDK 1.3

2004-11-16 Thread J.Pietschmann
I tried to compile Fop HEAD with JDK 1.3 (1.3.1_08) and got
loads of errors like
[javac] ...\fop\src\java\org\apache\fop\fo\flow\
  Constant expression required.
[javac] case LeaderPattern.SPACE:
[javac] ^
Surprisingly, the very same workarea compiles fine after a clean
and using Java 1.4.2_03.
There is no file in the source tree, I vaguely
remember it was one of the generated property class files. Therefore
there shouldn't be a LeaderPattern class there either.
I suspect the compiler just ignores the scope and gets the SPACE
constand inherited from Constants. Is this a bug or a feature new
in Java 1.4? Or is this just me?
BTW the buildfile could use some de-cruftification too (remove
the gensrc/.../properties stuff and a few now meaningless subtitutions)

Re: commenting the Knuth code/centering issue

2004-11-06 Thread J.Pietschmann
Glen Mazza wrote:
[BTW, I'm considering getting that Digital Typography
book by Knuth you had mentioned earlier.  Do you
recommend it?  (I was thinking that given all the time
I spend on FOP I should start looking a little more at
the scientific aspects of this work.)]
Yes, a must read if you are into computer assisted

Re: Exception hierarchy.

2004-10-27 Thread J.Pietschmann
Finn Bock wrote:
ValidateException is the right choice of exception when the FO file 
doesn't follow the content model.
Nitpick: s/FO file/FO processor input document/

Re: page-number-citation problem

2004-10-27 Thread J.Pietschmann
Randy Ouellette wrote:
We are having an issue with using the page-number-citation for outputting
the page-number for those pages that are inside a page-sequence when the
number is restarted (initial-number=1). We are trying to output a
page-number in a TOC but cannot get a value.
Please provide:
- FOP release information
- Exact problem description (expected result vs. actual result)
- A reasonably small test case

Re: Handling XML parse errors when using identity transform.

2004-10-21 Thread J.Pietschmann
Finn Bock wrote:
and there is no way AFAICT to set a different error handler on the 
XMLReader that the xalans transformer creates and uses to parse the 
input file.
Well, if you really need control over the parser, you
have to create one by yourself rather than relying on
StreamSource to do it for you. You can cast the
TransformerFactory instance into a SAXTransformerFactory
in order to get a filter which you can pass to the parser
instance as content handler. Look into Cocoon's XSLTransformer
component for a comprehensive example, and I'm sure the Xalan
docs have even easier to grok sample code.
Alternatively, you can
- parse into a DOM and use a DOMSource, if you don't mind
 the potential memory overhead.
- derive a custom class from SAXSource which sets up a
 properly custiomized parser instance, if you don't mind
 the programming overhead.

Re: rarr; in DnI documentation

2004-10-19 Thread J.Pietschmann
Clay Leeds wrote:
I found the rarr; in a bunch of places but the numeric 
entity took a while...
Bookmark this:
Ctrl-F and type in the entity name, the decimal or the hex
For completeness bookmark

Re: [GUMP@brutus]: Project xml-fop (in module xml-fop) failed

2004-10-11 Thread J.Pietschmann
Sam Ruby wrote:
[javac] ... warning: ... has been deprecated
[javac] import;
Jeremias, is there something we can do about this?

Re: Meta info [was: Printing from multiple trays with FOP generated output]

2004-10-09 Thread J.Pietschmann
Simon Pepping wrote:
We might also implement rx:meta-info instead of forcing users to
produce fox:meta-info or rx:meta-info depending on their intended FO
Fiddling with other people's namespaces is considered unpolite, at
least. They might not like this.
Processor specific extensions are just that: processor *specific*
There has been an EXSLFO initiative (search on sourceforge) in
order to get some extensions standardized, similar to EXSLT.
AFAIK nothing has been coming out of this, yet.

Re: Change parent of FOText from FObj to FONode?

2004-10-09 Thread J.Pietschmann
Glen Mazza wrote:
There *might* be
more subtle issues
Just do the change locally, run the test suite (well...),
see if anything important breaks. If not, check in.

Re: [VOTE] Luca Furini for Committer

2004-09-18 Thread J.Pietschmann
Simon Pepping wrote:
I propose that we make Luca Furini a member of the FOP team.
+1 from me.

Re: Handling of text-align=justify when nesting blocks

2004-09-01 Thread J.Pietschmann
fo:block text-align=justify
In the example, line 2 is neither the last nor the only line of a block, 
and it's also not a line ending in U+000A, so it should be justified.
It is the last line in the block. Trailing whitespace util the
closing tag is normalized away (or should be, FOP shows bugs here).
Why do you think it is otherwise?

Re: validateChildNode prevents extensions.

2004-08-29 Thread J.Pietschmann
Finn Bock wrote:
An extension mechanism where I can put an unmodified fop.jar and 
myextension.jar on the CLASSPATH and have it work is a defining issue to 
That's how it should work. The code build into the FOP core
should only validate elements from the fo namespace and
attributes from no namespace, and call validation for elements
and attributes from other namespaces in roder to give them a
chance to validate themselves.

Re: validateChildNode prevents extensions.

2004-08-29 Thread J.Pietschmann
Glen Mazza wrote:
Provided the extension namespace isn't already
hardcoded into FOP (like the fox: one).
There shouldn't be extensions hardcoded into the FOP core,
at least in the long term.
Errr, elements can't validate themselves, because
the validity of an element is defined only by the
The extension writer decides the content model, and if
an extension element is supposed to be child of a fo:block
only, the corresponding Java object has to get its parent
and verify its actually a fo:block.
 The recommendation declares, via the content
models, which children are valid for each parent, not
True for elements from the FO namespace *only*.
 This logic is naturally (and much more
cleanly) stored with the parent in the OO world,
allowing Finn's to have different child
nodes from FOP's
There is no Finn's in the proper model of
doing extensions. An extension writer should only write
the extension. The FOP core must
1. Provide a discovery mechanism for the extension. The
 service file used for this purpose in the maintenance
 branch can be easily extended just by dropping the extension
 jar into the classpath.
2. A configuration mechanism for the extension both for default
 and user supplied values. We don't have this currently.
3. A hook for the extension element factory. Works nicely.
4. A hook for validating the extended content model.
5. Hooks for doing layout and rendering.
Especially the API for the last will take some iterations,
but this doesn't mean
Furthermore, such a child-level validation would
require the kid to be instantiated first.  vCN() stops
instantiation of the kid from ever occurring if it
would be invalid to begin with.
From the viewpoint of a FO element, any elements (and
attributes) from other namespaces are valid and will be
instantiated. Then the foreign children get a chance to
validate themselves.
Granted, visible foreign content should be exclusively used
through instream-foreign-object, but this breaks down for
extensions like Karen's extension elements shown before or
after page breaks.

Re: validateChildNode prevents extensions.

2004-08-29 Thread J.Pietschmann
Glen Mazza wrote:
There shouldn't be extensions hardcoded
I thought of that (i.e., just make us a nice reference
implementation of the XSL standard), but PDF bookmarks
are just too popular
I didn't meant the bookmark extension should be discarded,
I meant the code should be pulled out of the core into a
separate, loadable extension, using the same mechanisms as
other extensions.
The content model *of that extension element*,
Wrong, the extension writer also decides in which FO
elements his extension elements can appear. *You*
certainly can't do this now.
And what about its relative ordering within the
fo:block?  Or its cardinality?  These are also defined
in the content model.
The Java object corresponding to the extension element
should have access to the part of the FO tree which
had been already constructed, including preceding
siblings. Constraints involving elements from the FO
namespace like must precede any fo:* elements are
a bit difficult, but there are ways around this (registered
callbacks, for example).
The proper model is the Rec,
This means you want to disallow *all* extensions except
as child of instream-foreign-object. This is a somewhat
strange contradiction to your stance with respect to
the bookmark extension.
You have a new FO, you're going to need to code for
them--including ordering and cardinality--in those
parents that accept them,
This does *not* necessarily mean that *you* should arrange
that the extension writer has to replace core FO classes.
In fact do either:
1. Declare FOP wont support extensions except in
 instream-foreign-object, ever, or
2. Provide hooks so that extension writers can get their
 extensions running with FOP, with or without extensive
 validation of the extended content model, but at least
 *without* having to rewrite and replace core FO classes.
The crude middle way to allow extensions but make it extra
hard for developers to get them working, *and* make it
nearly impossible for independently developed extensions
to cooperate (as Finn already explained to you several
times), is, well, crude, hard, and unnecessary.
Returning to the old method is not really an option. 
That's what was causing CCE and NPE's throughout the
system, whenever the FO was invalidly ordered.
*sigh* I should have time to do it myself. I don't see
why content model checking and drop-in extensions have
to be mutually exclusive.
If the current node is an fo:list-item and the
incoming node represents an fo:layout-master-set,
raise the exception immediately before you even get to
instantiate the fo:layout-master-set.
This does not mean you have to summarily reject a
finnbock:change-bar on the same grounds.

Re: [GUMP@brutus]: xml-fop/xml-fop failed

2004-08-23 Thread J.Pietschmann
Sam Ruby wrote:
 -ERROR- Bad Dependency. Project: avalon-framework unknown to *this* workspace
It seems there's something up in the pipeline with GUMP
Can somebody with more time at hand take care of this?

Re: DO NOT REPLY [Bug 25828] - [PATCH] should use java.endorsed.dirs

2004-08-10 Thread J.Pietschmann
There have not been complaints from users. Apparently FOP works fine with the
default XML parser and XSLT processor in java 1.4.
Uh, well, this is just plain wrong. There have been *lots* of complaints
about a slew of Xalan problems and a few Xerces bugs in the various
1.4 JDKs released over time, and they all got the stock response
upgrade your JDK or put the most recent Xalan/Xerces jar into
lib/endorsed, which seemed to work pretty well.
There ought to be a general way to set JVM options for
though, not only for setting java.endorsed.dirs.

Re: updated Batik libraries

2004-08-04 Thread J.Pietschmann
Glen Mazza wrote:
Just updated the two libraries  source code on
maintenance and HEAD.  (Only took 45 minutes...not

Re: [GUMP@brutus]: xml-fop-maintenance/xml-fop-maintenance failed

2004-08-03 Thread J.Pietschmann
Clay Leeds wrote:
The thing that caught my eye is that it indicates xml-fop-maintenance.  
I thought the maintenance branch (fop-0_20_2-maintain) was frozen (I  
haven't noticed anyone updating the code--I doubt it, but could it be  
my manual update of the design/layout.html page?

It's probably a Batik API change -- again --, probably because of
the recent jumbo Batik commit. Look here:
apache/fop/svg/ anonymous  
org.apache.fop.svg.SVGElement$1 is not abstract and does not 
override  abstract method deselectAll() in 
[javac] public float getFontSize(){
GUMP was designed exactly for the purpose of detecting these
problems. Which doens't give much of a guidance how to handle
this specific instance. We could just ignore it, and/or remove
fop-maintenance from the nightly GUMP.
I wonder why HEAD isn't affected?

Re: [GUMP@brutus]: xml-fop-maintenance/xml-fop-maintenance failed

2004-08-03 Thread J.Pietschmann
J.Pietschmann wrote:
I wonder why HEAD isn't affected?
Darn, HEAD got it too :-/

Re: Switch from AddLMVisitor to FObj.addLayoutManager()

2004-08-02 Thread J.Pietschmann
Victor Mote wrote:
I don't understand. More interested in working footnotes or multi-column
layout than what? Is removing AddLMVisitor an advancement in getting
footnotes or multi-column layout working better? Are you reminding us of
your neutrality on modularity? Or are you saying that this kind of question
is irrelevant? Please let me remind you that I was responding to a direct
I know. I just loathe heated debates around topics which are, from
the viewpoint of an end user, a side show. Modularity is nice to
have, but if there aren't any modules which actually do the real
work, users hardly care. I wish everybody would expend the
energy on more pressing issues.

Re: Switch from AddLMVisitor to FObj.addLayoutManager()

2004-08-01 Thread J.Pietschmann
Victor Mote wrote:
I mention it only to point out the *real* issue in case any
real FOP stakeholders are interested.
Well, the real stakeholders (aka users) are probably more
interested in working footnotes, or multi-column layout.

Re: fox validation

2004-07-23 Thread J.Pietschmann
Simon Pepping wrote:
The code in Root shows that fox:bookmarks is the only allowed fox
child of fo:root. It is not clear that that is true. The web page
extensions.html does not even mention fox:bookmarks. The example file
examples/fo/basic/ clearly embeds fox:outline elements in
fox:bookmarks. The docbook stylesheets authors place fox:outline
elements directly in fo:root. FOP-0.20.5 has no problem with this
arrangement. Even if it is true, it creates compatibility problems.
This was changed in the redesign, outlines for bookmarks must now
be put into a fox:bookmark. Yes, this is incompatible but cleans up
pathological cases like
   fo:layout-master-set ... /
Some bookmarks in the above case wont be rendered, and it's quite
difficult to reliably check for this condition. If there can only
be a single fox:bookmark, error checking is much easier. Some would
also claim it enforces better writing style.

Re: retreat...

2004-07-12 Thread J.Pietschmann
Glen Mazza wrote:
On my two earlier API proposals [1], I'm going to take
a step back on the first one about combining the
apps.Driver class into apps.Fop.  Joerg's thoughts
that the API wrapper/class and application
wrapper/class should be distinct is weighing on my
mind; in the future we may find it beneficial that we
have them separate.  Also, Driver has a long history
with our application that disturbing may not be

The second issue I'm still unsure on, although I moved
FopPrintServlet from using XSLTInputHandler to JAXP.
That's ok, now that JAXP is more or less ubiquitous. The
XSLTInputHandler predates JAXP by quite a bunch of months.

Re: [PROPOSAL] API Changes

2004-07-11 Thread J.Pietschmann
Glen Mazza wrote:
That should be enough for us in 1.0, no?  Those more
elaborate API goals appear best discussed post-1.0,
presumably once more vital parts of the system have
been addressed.  
A stable API is as important as other major features.
If we do a mojor release, post-release API changes should
be small and rare.

Re: [VOTE] PMC chair for XML Graphics

2004-07-09 Thread J.Pietschmann
Jeremias Maerki wrote:
[ ]  I vote for Peter B. West as PMC chair.
[X]  I vote for Jeremias Maerki as PMC chair.

Re: Problems with URL encoding in FOP docs

2004-06-30 Thread J.Pietschmann
Peter B. West wrote:
In there occurs 
the following link:

The question mark and ampersand are encoded as expected.  When I hover 
on this link in Mozilla, I get:
as expected.

When I follow the link, I get the _encoded_ values in the location 
window, and 
Well, the server is right: the URL is sent verbatim, with the special
chars encoded, which makes the server look for an object named
rather than for the object
with the parameters
 the website tells me that URL with the _unencoded_ values is 
not available.
An artefact of the error message generator, I would think.
When I manually change the URL in the location window to 
contain the _encoded_ values, it works.
Weird, but probably works as designed.
How do I fix this?
Something is wrong with the XSLT processor's serializer. The usual
- Check JDK version, upgrade if necessary
- Install latest Xalan into lib/endorsed, if necessary
- Submit bug report, if the problem still persists.
It might be prudent to check whether the source doesn't already contain
the wrong URL.

Re: Offline

2004-06-17 Thread J.Pietschmann
Peter B. West wrote:
I will be offline for the next week.  I'm marrying Jenni tomorrow, and
honeymooning in the frozen south of the South Island of New Zealand for
a week.
Congrat's from me too  have a nice week.

Re: [3rd post] Memory growth in version 0.20.5

2004-06-09 Thread J.Pietschmann
Mark C. Allman wrote:
Is there a way to manage FOP's memory usage?  I'm not talking about 
increasing the JVM memory and stack space, I mean the amount of memory 
FOP allocates as a function of job size.  What we're experiencing is an 
almost linear growth in memory demand for the reports we produce.

Check *all* points mentioned in
Tables in particular cause a linearly increasing memory consumption due
to a sort of a memory leak. If you are adventurous, there is an
unreleased fix for this in the repository.

Re: Justification and line breaking

2004-05-20 Thread J.Pietschmann
Peter B. West wrote:
Do you know of a web-accessible version of the paper, or summary of the 
Try the TeX book, available as TeX-source from your nearest
CTAN server. The description is, umm, somewhat obscure, you
should get the commented TeX source (the .web files) as well.

Re: User configuration for hyphenation

2004-05-11 Thread J.Pietschmann
Glen Mazza wrote:
I believe the same logger reference would just be used with each thread 
es, and this is exactly the problem. It is conceivable that
different loggers might be used for different processors which
run in concurrent threads. A static logger reference prohibits

Re: User configuration for hyphenation

2004-05-09 Thread J.Pietschmann
Simon Pepping wrote:
hyphenation is done deep down in the process, where any reference to
the start-up objects is lost. How can I configure it?
The idea is to use an object which represents some global context
and is reachable from nearly anywhere in the FO tree and LM tree.
I'd think the user agent should qualify.
Static objects are bad because of the usual MT issues (yeah,
even for logging).

Re: User configuration for hyphenation

2004-05-09 Thread J.Pietschmann
Glen Mazza wrote:
Should FOP itself multithread, or would it be better to let whatever 
would call FOP do the multithreading?
I don't understand this. Even if the main processing methods
of the FO processing object are synchronizend, which is probaly
what you understand by FOP isn't MT safe itself, the user
can create multiple processors and use them concurrently.
Mutable static data, like a static logger reference, interferes
severely with using FOP in an MT environment, because this means
one thread rendering globally rather than one thread per FO
processor rendering, perhaps using multiple processors, each one
in a separate thread. That's going to raise complaints, check
the posts complaining about the global options in the maintenance
I'd say we may use static data only in a few cases:
- Immutable data, like name mappings and fallback options
- Global font and perhaps image caches
- Object pools (although they are said to decrease performance
 for modern JREs)

Re: User configuration for hyphenation

2004-05-09 Thread J.Pietschmann
Glen Mazza wrote:
BTW, what other things besides hyphenation needs to go into 
user configuration/fo user agent, say for 1.0?
Various strategy parameters, once they are implemented, like
line breaking strategy; furthermore callbacks for redrawing
pages in a GUI renderer, font management, base URL for images

Re: Justification

2004-04-23 Thread J.Pietschmann
Simon Pepping wrote:
Summarizing, you mean that

1. the layout system should calculate the justification and add
   corresponding word and space areas to the area tree;
Eh, not quite. The problem is that the actual justification can
only be done after page number citations have been resolved.
Furthermore, as you noted, for certain output formats justification
can be left to the viewer in some circumstances (remember reference
aligned leaders - I don't think there is any format which can deal
with this in justified text).
I'd like to have the following:
- The layout does whitespace processing, computes line and word
  breaks and creates a corresponding area tree.
- The renderers call a layout routine doing the justification before
  rendering the line.
In case the output format can deal with the needed justifications at
hand itself, the renderers may emit the appropriate commands to the
output without calling the layout routing for justification, as a form
of optimization.
2. the area hierarchy should be revised to make it as light-weight as
   possible, to minimize resource consumption.
Oh, certainly, with priority on the Area objects which occur
most often and lock up the largest amount of memory.

Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-20 Thread J.Pietschmann
Glen Mazza wrote:
If you replace whitespace handling with replacing every occurrence of 
the letter 'A' with 'B', a similar idea, you can see what I'm getting 
at--the fo:block should be able to clean itself up (if there is such a 
property defined for the fo mandating that cleanup) prior to presenting 
itself to layout, so layout needn't be concerned with the whitespace 
handling.  The earlier this is done, preferably in flow.Block, the less 
work (and fewer instantiations) for FOText object and TextLayoutManager.
Handling space normalization before the text gets into layout might save
some work if the layout uses backtracking. Nevertheless, Text can come
from inline FOs as well, and the normalization process is sensitive to
properties on inlines, e.g. in
 fo:blockA fo:wrapper text-decoration=underline B/fo:wrapper
*both* spaces between A and B remain (the current implementation is
non-conformant in this respect).

IIRC flow.Block is parsed into multiple FOText items, each of which get 
fed into the TextLayoutManager.  I'm not certain that line breaks are 
actually being created during layout; rather, during parsing, I suspect 
the BPD is just incremented and the next line is rendered.
The inline layout managers create line break possiblities if they think
a line is full, rather similar to the maintenance branch code. The break
possibility bubles up to the nearest block layout manager, which stores
it, updates the BPD and goes ongetting further break possiblities from
the child LMs.

Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-18 Thread J.Pietschmann
Glen Mazza wrote:
proper TR14 line breaking needs
a precious character LB property and a whitespace status
Darn, should be previous.

I'm not sure what you're referring to here--the TR at, doesn't appear to mention 
a whitepace status or LB property per se.
In order to determine whether a line break can be inserted between two
non-space characters with optional space chars in between (which would
either be left at the end of the line or discarded), the algorithm
needs the LB properties of both the characters as well as whether
any space was encountered. The line breaking property (those AL
etc. stuff) is defined in one of the Unicode data files for each
Hmmm...not that big a deal to me, but I would be inclined to keep the 
whitespace removal out of the LayoutManagers, because it is fo:block 
I don't quite understand this argument. Handling space-before is also
fo:block specific. Where should this logic be put, then? Note that
whitespace handling includes removing spaces around line breaks which
are introduced during the layout process.

Re: DO NOT REPLY [Bug 28130] - Errors in the calculation of adjustment factors

2004-04-17 Thread J.Pietschmann
Chris Bowditch wrote:
Just want to add that I realise changing TextLM.addAreas isnt the only 
other change required to get jusitification working. The Renderers will 
need changing too, but I'm against the renderers computing their own 
splits, just give them each word as its own area if justification is on.
This caused some problems in the maintenance branch code, although
the mistakes made there can be avoided.
The biggest problem is that lots of WordArea objects are created
which hang around some time and which also inherit a *lot* of
unnecessary (for them) fields from Area. I think some refactoring
of the Area hierarchy could be in order. The current state in the
maintenance branch is roughly like this:
  Box (not many attributes)
   + Space
   |   + DisplaySpace
   |   + InlineSpace (well, shoud be here, but actuall isn't)
   + Area (position, border, padding, children, heigth, width etc.)
   + BlockArea (content heigthwidth etc.)
   |+ LineArea
   |+ etc.
   + InlineArea
+ WordArea (ugh, maybe this was TextArea instead)
| + etc.
+ some non-word inline areas
Many inline areas can't have border, padding, background and
perhaps some other traits, and all the space is wasted in objects
which are instantiated *very often*. This added up to significant
ressource problems.
I'm still not quite sure what's the best approach to fix this.
In C++, it certainly would be multiple inheritance. In Java,
we could try using interfaces and some delegation:
 interface Area
 interface BlockArea extends Area
 interface InlineArea extends Area
 interface BorderPaddingBackgroundArea extends Area
 interface NonBorderPaddingBackgroundArea extends Area
 interface Space extends Area
 class AbstractArea implements Area
 class AbstractBlockArea implements BlockArea extends AbstractArea
 class AbstractInlineArea implements InlineArea extends AbstractArea
 class LineArea implements NonBorderPaddingBackgroundArea
   extends AbstractBlockArea
 class WordArea implements NonBorderPaddingBackgroundArea
   extends AbstractInlineArea
 class PageNumberReferenceArea extends WordArea
 ... and so on ...
(Well, because AbstractBlockArea is supposedly abstract, what class
represents ordinary block areas? We need a good name here :-) Note
that the real block area class may have traits which are not
applicable to for example the line area class or the table row area
The code for accessing the border, padding and background traits will
be duplicated in all classes implementing the BPBA interface, but given
that the traits are combined in a single class, this shouln't be much
of a problem, should it?
Some inline areas may not have children, this could lead another set
of interfaces.
A potentially second problem are the space non-areas. In the maintenance
branch code, display (block) space and inline space just have a height
(bpd) or a width (ipd), respectively. I'm not sure whether this is
sufficient, but perhaps it is.

Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-17 Thread J.Pietschmann
Glen Mazza wrote:
A further optimization might be to do all this before
the Block is even parsed into FOText and Inline
objects, as many spaces-only objects would end up not
even needing to be created.
This will not account for spaces to be removed around line
My preferred solution is to combine space processing with
break calculation. This needs some sort of status passed
through to the layout managers, perhaps as part of the
layout context. But then, proper TR14 line breaking needs
a precious character LB property and a whitespace status
too, so this can be combined. The processing would be
roughly as follow:
 *for* word *in* text (separated by whitespace)
   normalize the whitespace (optimize normalization away
for some whitespace status).
   calculate TR14 breaks at the beginning of the word
   *for* TR14 break possiblities *in* word
 *if* line full
check hyphenations
return previous break possiblity
   *end for*
 *end for*

Away from keyboard

2004-03-30 Thread J.Pietschmann
Hi all,
I'm offline for the next two weeks.
Have fun!


Re: baseline-shift property

2004-03-16 Thread J.Pietschmann
Søren Christiansen wrote:
Therefore I want 
to add this to the PDF render, but I after ive been studying the source 
and the development articles in 2 days I still cant figure out how to do 
Don't bother. It is already implemented in the new code. If you want to
work at HEAD, you are welcome.
I came across a subclass*** Class BaselineShiftMaker* in the API doc but 
its not distributed with the snapshot of the source !?
It's code generated during the build.


Re: Implicit grants (FOP hyphenation)

2004-03-05 Thread J.Pietschmann
Jeremias Maerki wrote:
But that's not the reason I write this. I've done the relicensing on the
XML FOP project and was again confronted with our hyphenation files. Two
of them now have the ALv2 header because for these two files all legal
problems have been dealt with
I don't think it is necessary to put *all* files under APL. If we can
assume the content had been granted, and there is no infformation to
the contrary and no incompatible license already in the file, we can
just leave it as it is, or perhaps add something to the effect
 ... has been contributed to FOP and is assumed to be licensed
 for all purposes FOP can be used. Contact the authors stated above
 for further details.
(unless license@ says otherwise, of course)
Tracking down the original committer from CVS might help too, but
in general I wouldn't loose much further sleep on the whole issue.
Of course, newly contributed files should be put under APL which
means all issues have to be resolved before the file is committed to

Re: fop.xconf

2004-03-04 Thread J.Pietschmann
Peter B. West wrote:
What's the intention for fop.xconf? 
It's been there for ages with the intent to provide for
configurable, easily changed defaults. Unfortunately
1. There's not much more than PDF filters (in the maintenance
 branch), and if all filters are deleted, the code uses a
 flate filter anyway (which means you have to provide a
 nop filter in order to have a look at the uncompressed
 PDF code).
2. The fop.xconf, userconfig and command line options
 are not merged, although they should.

Re: fop-dev used to spread virus

2004-03-04 Thread J.Pietschmann
Andreas L. Delmelle wrote:
a) The Apache list server has no virus scanner?
The Apache list server has a virus scanner. It just happend
that there were apparently at least three and more than 7
new variants of Bagle, MyDoom and NetSky released yesterday
within a few hours.
Also, the worms seem to be specifically designed to also get to
subscriber-only lists. This may be a side effect of matching
gathered sender and reciever addresses. Think of the worm
finding saved mails or cached HTML pages of a web archive and
deducing if there's a To: [EMAIL PROTECTED] and a From
[EMAIL PROTECTED], then foo-list is more likely to open
suspicious attachments if the sender were foo, which unfortunately
also gets mail through to lists.
All lists with some volume I have subscribed to have been
targetted by the worms. This is a novelty. While there had been
worms forwarded to lists by clueless people in the past, this
seems to be the first time a worm managed to get to the
subscription barrier on its own.

Re: fop minimum requirements

2004-03-03 Thread J.Pietschmann
Clay Leeds wrote:
p.s. On second thought, maybe that'll be something I'll figure out 
myself (although it would be better if the legwork were already done! :-D)
I don't have intentions to install a 1.2 on my smallish
and almost full HD. There were, however, *zero* complaints
about problems running 0.20.4 or 0.20.5. There were some
poeple stuck with an oldish IBM 1.1.8 (or even 1.0.x) JDK
on some exotic platforms, therefore if there were problems
for typical use cases, I expect at least one to report it.
Users of the precompiled binaries wont necessarily notice
any problems though.

Re: fop minimum requirements

2004-03-03 Thread J.Pietschmann
Clay Leeds wrote:
...but what you're saying, is that there should not be problems for 
*binary* versions--only if users want to build from src themselves under 
Well, no problems reported doesn't mean no problems. There
may be well hidden problems in rarely used functionality, and
people stumbling over them just stopped using FOP.
It's certainly better to check.

Re: [VOTE] Clay Leeds for Committer

2004-03-01 Thread J.Pietschmann
Glen Mazza wrote:

Wiki Migration and other issues

2004-02-27 Thread J.Pietschmann
Hi all,
now that the ASF has its new Wiki farm up and running,
they pester everyone with moving from UseModWiki
to the MoinMoinWiki:
Should we wait for the Apache XML reorganization to
complete or should we rush ahead and create out own
Wiki already?
The other issue: The hyphenation files with problematic
licenses are apparently still in the HEAD CVS ready for
checkout. I can't remember any status change here. What
should we doe with them?

Re: [VOTE] Remove Visitor Patterns from

2004-02-27 Thread J.Pietschmann
Glen Mazza wrote:
That hasn't changed; for 1.0 that work is still done
in the renderers (as opposed to 0.20.x which had
extensive rendering business logic in the layout
objects, the 1.0 InlineArea.renders() just made a
one-line command to a renderInlineArea() method in
the renderer objects).
The problem is that text justification and leader expansion
is a layout task, not a renderer task. And text justification
can't be done during the normal layout pass because of
page number references, it has to bedeferred until the
references are resolved.
There is a reason why the maintenance code is as it is.

Re: cvs commit: xml-fop/src/hyph cs.xml da.xml de.xml de_DR.xml el.xml en_GB.xml en_US.xml fr.xml nl.xml no.xml sk.xml tr.xml

2004-02-27 Thread J.Pietschmann
Simon Pepping wrote:
 Removed: src/hyph cs.xml da.xml de.xml de_DR.xml el.xml en_GB.xml
   en_US.xml fr.xml nl.xml no.xml sk.xml tr.xml
 Removed legally problematic files as done for the maintenance branch.

What are those legal problems? The Dutch file nl.xml is based on the
hyphenation patterns created by the Dutch TeX user group, and are
freely distributed with TeX software. Why cannot FOP distribute them?
The XML file says
  Original (TeX) author: Piet Tutelaers
  Transformed to XML by: Reinout Verkerk ([EMAIL PROTECTED])
  Character encoding corrections by: Carlos Villegas
without further license or copyright indications, which makes
them copyright holders and gives them exclusive rights.
The original isn't easy to track down but is probably here
and says:
% COPYRIGHT (C) 1996: Piet Tutelaers
% COPYING: This file can be distributed freely if its contents and name
%  is UNCHANGED and as long as you don't ask money for it.
This is not compatible with the APL.

I'm sure noone of the mentioned guys will oppose putting the stuff
under APL 2.0 but we should at least ask. Well, Jeremias asked last
year, without any result so far, I think.

Re: [VOTE] Remove Visitor Patterns from

2004-02-25 Thread J.Pietschmann
Glen Mazza wrote:
But the InlineAreas aren't coupled to the
Renderers anymore--they don't make a reference anymore
to Render objects in their code.  
Question: in the maintenance branch, leader expansion and
text justification is done just before a line is rendered
in the LineArea.render() method. The reason for this is
that it's only the renderer which knows when page number
references are resolved.
How will this fit in now?

BTW I don't think it's good style do ignore a veto and
commit a change even before the discussion is resolved.

Re: [VOTE] Remove Visitor Patterns from

2004-02-24 Thread J.Pietschmann
Glen Mazza wrote:
1.)  Remove the serveVisitor()

2.)  (This I'm less sure on)  After reverting, I'd like to remove the 
render functions within the InlineArea objects in favor of direct 
function calls within AbstractRenderer:
Remember one of the three basic OO principles: use
virtual methods instead of switch according to a
class marker.
and remove the bounce-back between 
Renderers and Area objects, further simplifying the coding.
But this is what keeps the renderers pluggable. If these
methods are removed, every renderer must follow the same

Re: [PATCH] Support for percentages and table-units

2004-02-24 Thread J.Pietschmann
Finn Bock wrote:

That is not correct. Temporarily storing the area dimension in the FO 
tree just long enough for the getNextBreakPoss() to return does *not* in 
any way prevent reusing the FO tree or the LM tree for an other 
rendering run.
It prevents overlapping/concurrent runs. Whether these are useful
is quite another matter.
There is also more good reasons for having an LM tree than just code 
reuse. The lineLM and a separate place for the layout logic just to name 
Given that there is a LM class for each FO class, and a LM
object for each FO which basically duplicates most of the FO
data, I don't think the three additional LM classes count
all that much. And I'm not sure why it's an advantage to
separate layout logic from the FO tree while the FOs are
still used to store transient data used in the layout
Code reuse is an issue, but it can also be solved through
The real benefit of separating LM and FO would be pluggable
layout engines, but I have the feeling this would also collide
with using the FOs as storage for some layout process data.
I don't know what splicing means,
 fo:block id=a border-start-width=1mm+10%
fo:block id=b border-start-width=10mm
(I have to make up a function because a quick check of section
5.10 indicates there is *no* function which returns the unresolved
value, contrary to what I seemed to remember)
The b block border-start-width is parsed into a tree
+-- 10mm
+-- my:get
and after the arguments of the function has been parsed,
the calling node is replaced with the parsed tree of the
a block property:
+-- 10mm
+-- plus
+-- 1mm
+-- 10%
This can be folded into
+-- 11mm
+-- 10%
if someone feels like implementing constant folding. The 10%
will be resolved when the b block LM looks for break positions,
as usual.

but the issue that I don't understand 
your solution to is when a child fo makes a reference to an computed 
value that is an expression (like 10% of IPD of 'a') in a parent fo.
Wild pseudocode
  getNextBreak(LayoutContext lc) {
BorderAndPadding  bp = propMgr.get(lc); // get and resolve
LayoutContext childLC = new LayoutContext(...)
childLC.setBorderAndPadding(bp) // or pass it elsewhere
Setting *all* potentially inheritable properties (inheritable
via 'inherit', not necessarily automatically inheritable) may
be a bit clumsy, but there could be some refactoring to bundle
it otherwise, or even that
  new LayoutContext(propMgr,lc)
creates all resolved properties in the new layout context.

I'd be +0 to volunteer wink somebody to implement that.
Me too :-)

Should we delay my proposed patch until somebody has come up with an 
implementation that pass the LayoutContext to all Length.getValue(lc) 
I don't see much value in delaying your patch, but let's keep
an eye (or bugzilla entry) on this issue.

Re: [PATCH] Support for percentages and table-units

2004-02-23 Thread J.Pietschmann
Finn Bock wrote:
I don't understand how you propose to solve any of this, but I hope it 
would be Ok to commit the straight forward solution I propose.
Whatever works. I just want to note that given the almost one-
to-one correspondence between FOs and LMs both in classes and
instances (with the exceptions of page, column and line LM),
the only advantages of having LMs is
- code reuse by inheritance
- no layout related data in the FO, for better sharing/reuse
Keeping area dimensions in the FO kills the latter.
For storing reference measurements for resolving in the layout
context, you have only to keep track of inheritable properties,
which are basically font-size, ipd and bpd. References to
specified values (in contrast to computed values) can be handled
by splicing in the parsed property expression for the referenced
property as replacement for the referencing function. This way
the FO tree holds properties (parsed property expressions), while
the layout context and the area tree hold the refined traits.

Re: [PATCH] Support for percentages and table-units

2004-02-20 Thread J.Pietschmann
Finn Bock wrote:
If it is evaluated already where would the evaluated value be stored?
The layout context for the child LM could be an appropriate place.

And then the value should be 
reverted to the expression when the base value changes due to breaks.
No problem, this is known at the place where a new Layout context is
created for getting BP from the child LM.

Re: Java thory and proctice: Garbase collection and performance

2004-02-20 Thread J.Pietschmann
John Austin wrote:
Isn't allocation the only unseen part of construction ? Everything
else is visible in the code and surely a few assignments are never
expensive. Any other expensive operations will stand out in
measurements of code execution.
That's correct. However, the article seemed to shout Don't
worry about creating as much objects as you want, which I
wouldn't support if taken literally. You are right that
proper tools should uncover any additional overhead though.
Moore's law is another optimization we sell in advance
all the time.
Twenty years ago, I had to work on a 8008 driven computer
with 4k RAM and 12k ROM. That's enough to run a program
which nicely prints formatted and justified text (25 lines
a 80 characters). We went a lng way since then.

Re: [PATCH] Support for percentages and table-units

2004-02-19 Thread J.Pietschmann
Finn Bock wrote:
If an expression reference another expression in a parent fo, the parent 
fo expression must be evaluated against the LayoutContext that was in 
effect for the parent fo and *not* against the child fo LayoutContext.

fo:block id=a border-start-width=10%
   fo:block id=b border-start-width=inherit
It must be the LayoutContex for 'a' that is used when we evaluate the 
10% even when we call:
with the layout context for 'b'.
Well, I used to believe the 10% has been evaluated already, and the
inherited property can grab the absolute value immediately.

Re: Java thory and proctice: Garbase collection and performance

2004-02-19 Thread J.Pietschmann
John Austin wrote:
I noticed this artcle on Developer Works:

Java theory and practice: Garbage collection and performance
Something to read on Thursday.
Nice read, however, they don't talk about constructors. There
are still arguments for reusing objects and for trying to
replace objects with a bunch of primitive values.
(BTW a nice try selling yet-to-be-written optimizations
regarding inlining...)

Re: [PATCH] Support for percentages and table-units

2004-02-17 Thread J.Pietschmann
Finn Bock wrote:
 Perhaps, but I doubt it. If they was change to always get a reference to
 the parent layout context when they are created, and if they had a
 reference to the FObj, and if they was made available to the property
 subsystem, then they could properly be used for it.
The layout context has the actual IPD MinOptMax. There is no
inherent reason it should have a link to a parent context or the
property subsystem, it's only necessary to have a method to
resolve a property expression given a set of MinOptMax for
the various traits which can be used as references for
percentages. Like
 I still think it is easier to use either the FOs or the LMs .



Re: [PATCH] Support for percentages and table-units

2004-02-16 Thread J.Pietschmann
Finn Bock wrote:
Somehow, in our current design, the information must be stored in an 
object that exists:
IIRC that's what the layout context was meant for.


Re: PMC representation

2004-02-12 Thread J.Pietschmann
Chris Bowditch wrote:
Jeremias to remain as one of our PMC representatives:


+1 for Jeremias
Me too

Re: Questions about minimum requirements

2004-02-12 Thread J.Pietschmann
Clay Leeds wrote:
Also, on the subject of minimum JRE, if it is decided to set the minimum
requirements for FOP-1.0 to JRE 1.4.x, might it still be possible to run
FOP under a pre-1.4 JRE with certain frameworks installed (I don't know
what those would be)? If so, would there be documentation for such
The current code relies on extensions of JRE core classes. I don't
think this could be easily retrofittet to a pre 1.4 JRE, unless
you *like* fiddling with the bootclasspath.

Re: FOP components

2004-02-09 Thread J.Pietschmann
Clay Leeds wrote:
As for 1.0 (forgive my playing the devil's 
advocate here), why stop at 1.4? Assuming Java 1.5 will be released by 
the time FOP 1.0 will be released, why not base it on the latest and 
greatest? Would that offer any benefits? What problems might that lead to?
Well, 1.4 is out since nearly two years now (counting the usable
betas), and will be stable for nearly two years or more when
1.0 is released. This means it will be available for even pretty
obscure environments. The only users stuck with 1.3 and earlier
will the ones where upgrading to 1.4 implies severe side effects,
mainly upgrading essential software services, the whole OS, or
even hardware upgrades.
From what I heard, development efforts are meanwhile firmly based
on 1.4, everything based on 1.3 is strictly maintenance, with
a gradual migration to 1.4.

Re: JUnit test failure

2004-02-09 Thread J.Pietschmann
Jeremias Maerki wrote:

I got another one. Probably a Xerces version problem. No good idea for
both problems, I'm afraid.
org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create or
change an object in a way which is incorrect with regard to namespaces.
at org.apache.xerces.dom.CoreDocumentImpl.checkDOMNSErr(Unknown Source)
I guess you add an attribute with a QName prefix which isn't declared.
In DOM, the xmlns:prefix are indeed attributes and must be added
to the scope before createAttributeNS is called. I've been bitten
by this too, although I remember another wording of the error.
On 07.02.2004 23:56:40 J.Pietschmann wrote:
I get a nice Junit failure:
The JUnit FAQ explains this nicely.


Re: FOP components

2004-02-08 Thread J.Pietschmann
Peter B. West wrote:
I don't know Avalon, so I don't know what other facilities from there we 
are using or considering using.
Avalon is a great way to decompose a large system into components.
The real advantage is that there can be different implementations
managing the components.
For example, we could build our own simple managers, more or less
hardwiring what FOP components are instantiated and how they are
composed into a working FO processor. That's good enough for
testing and a simple CLI. Others could use ECM or Phoenix or
whatever the Avalon guys throw out for efficient configuration
and lifecycle management of the whole stuff in a server environment,
with object pooling and caching (images and fonts) and so on.
I should probably move a bit faster in order to provide some
working sample code so everybody can see how this could look
like for FOP.
If we are involved in such considerations, we need to decide how we 
propose to support our 1.3 user base.  The most recent discussions 
showed that a number of users face steep costs to upgrade to 1.4.
As for the 1.4 discussion: The jakarta commons list held it at
some length a few weeks ago. It's choosing between Scylla and
Charybdis: Using 1.4 gives a lot of functionality, thereby giving
the project leverage to move faster rather than worrying about
reimplementation of such functions. OTOH, it may lock out users
on platforms which lag behind. There was also the consideration
that many enterprises have servers based on 1.3 deployed, and
upgrading a working service is usually frowned upon, even if a
convenient path is available.
Given that FOP 1.0 wont be released until at least late this year,
if not later, we could tell our 1.3 users to use 0.20.5 and
declare 1.4 the minimum for 1.0.

Re: FOP components

2004-02-08 Thread J.Pietschmann
Andreas L. Delmelle wrote:
Apache projects... Perhaps someone at Jakarta already has an idea for a
common Preferences library? (AFAICT not)
It's common enough that everybody invents its own solution, as
Avalon provides for XML configuration files, there are classes
mapping XML structures conveniently to objects, which can be
passed to Avalon components. That's one of the more amenable
features of Avalon.
There's also jakarta commons configuration, which uses property
files (IIRC, may well be wrong).
Other approaches include using commons digester or betwixt for
reading XML, using a handcrafted XML reader as 0.20.5 does, or
using JNDI like J2EE.
No shortage of ideas at all :-)


JUnit test failure

2004-02-07 Thread J.Pietschmann
Hi all,
I get a nice Junit failure:
Testcase: testFO2PDFWithDOM took 0.23 sec
Caused an ERROR
loader constraints violated when linking org/w3c/dom/Node class
java.lang.LinkageError: loader constraints violated when linking
 org/w3c/dom/Node class
 at org.apache.xalan.transformer.TransformerIdentityImpl.
 at org.apache.xalan.transformer.TransformerIdentityImpl.
 at org.apache.fop.BasicDriverTestCase.
 at org.apache.fop.BasicDriverTestCase.
This seems to have something to do mixing Jars form the JDK and
fop/lib. Does anybody have an idea how this can be avoided?

FOP components

2004-02-07 Thread J.Pietschmann
while fiddling with the New API it dawned me there are actually
several packages in FOP:
- the basic engine, embeddable
- a CLI wrapper
- a servlet wrapper
- an Ant task
- the font utilities
- the hyphenation data utilities (basically the Ant task)
- a variety of extenstions.
Perhaps the AWT viewer or even all of the renderers can be
added here too.
The problem is managing the dependencies:
- commons-cli for the CLI wrapper (new API), also probably
 nice for the font utilities
- Ant for the FOP Ant task and the hyphenation data task
- servlet.jar for the servlet
- avalon and logging for the base library.
There ought to be a less messy approach. It could be an idea
to move the various packages to different base directories,
making FOP essentially a multi-subproject project similar
to jakarta commons. This way each subpackage has its own
buildfile and lib directory, and dependencies become more
clear. In order to manage the cross package dependencies and
dependencies outside of FOP, Maven seems to be the tool of
choice. Unfortunately, I'm not enough of a Maven wizard to
asses this completely. Is there somebody out there with more
time at hand to look at these issues?
1. I'd like to get rid of the servlet.jar in our CVS.
2. If we standardize on JDK 1.4 as base (as it currently
 is), we could drop the Xerces, Xalan and xml-api jars as
 well. Our Jars seem to be somewhat outdated anyway.

Re: Nasty layout bug: maint vs. HEAD

2004-01-31 Thread J.Pietschmann
Andreas L. Delmelle wrote:
In 0.20.5 this works very fine... In HEAD strangely the document is layed
  laid :-)
out such that the first TOC page ends up after the last detail-block for
which it contains the link...
I don't understand the problem. Could you trim it down to two detail blocks,
and post the FO (assuming the trimmed down FO still has the problem)?

Re: AW: RTF: white-space-treatment and linefeed-treatment

2004-01-27 Thread J.Pietschmann
Peter Herweg wrote:
Maybe difficulties is the wrong word. Just a thing i have to care for. If i
do the processing of FOs in endBlock, i have to suppress the processing
within nested blocks. Or the nested blocks will be processed twice.
I think you can flush the queue each time a nested block
starts and each time a block ends. The start of a new block
forces a new line, so you can finish the current line,
including whitespace processing.

Re: Unnesting properties and makers.

2004-01-26 Thread J.Pietschmann
Glen Mazza wrote:
Well, instanceof is slower I believe, but better
Instanceof is exactly as fast as a simple function call
after warm-up.

Re: RTF: white-space-treatment and linefeed-treatment

2004-01-26 Thread J.Pietschmann
Peter Herweg wrote:
(2) I defer the processing of all inline-generating, text-containing FOs,
and process them in RtfHandler.endBlock.
I'd say start with this option, although I'm starting to believe we
could and should move whitespace processing to before the invocation
of the structure renderer's character event call. You still have to
delay some output because space before/after a line break must be
stripped for many settings.
What are the difficulties for nested blocks?


Re: Unnesting properties and makers.

2004-01-26 Thread J.Pietschmann
Finn Bock wrote:
Instanceof is exactly as fast as a simple function call
after warm-up.
That is not what I remembered,
I'm surprised. I made some measurements with a JDK 1.3.0,
with ~50 warm-up cycles to give HotSpot something to
optimize, and vaguely remembered instanceof was slightly
faster (~1%) than a foo(){return true;}. It may have
something to do with the test setup. I wouldn't rule out
I tested in a class without inheritance :-)

Re: Unnesting properties and makers.

2004-01-25 Thread J.Pietschmann
Andreas L. Delmelle wrote:
Does anybody know what space means for line-height???
Know? I guess not. But judging from the spec...
Ah well, I overlooked this
XSL adds the following value with the following meaning:
  Specifies the minimum, optimum, and maximum values, the conditionality and
precedence of the 'line-height' that is used in determining the
Perhaps this is just a way of saying that 'line-height' can be 'shorthanded'

line-height=min opt max cond prec??
Uh no, it's more ugly: line-height is actually meant to be
a compound property, like space-before. I.e. it is possible
to write
 fo:block line-height.optimum=12.5pt line-heigth.maximum=13pt
The precedence and conditionality are combination of the
half-leading with space-before and space-after at the beginning
and the end of the block, I think.
I see why they thought this is necessary, but this kind of spec
makes it unnecessary hard to follow.

Re: Getting rid of JIMI

2004-01-25 Thread J.Pietschmann
Jeremias Maerki wrote:
I will probably have some time next month to write a proposal on how our
two projects can move closer together to make the code sharing happen.
Stuff that comes to mind immediately:
- fonts, metrics, character and word width
- various configuration stuff
- character normalization and line breaking (for SVG flow text)
- command line wrapper
- common area rendering
- embedded images, of course
- API concerns, as discussed: hooks for custom resolvers for fonts,
 images, URLs in general

Re: PageViewport question

2004-01-25 Thread J.Pietschmann
Tibor Vyletel wrote:
I would like to ask, what's the reason why PageViewport class is not
descended from Area class.
Mainly because it's not an area. It makes a difference for example
for rendering into AWT windows and such.

Re: Unnesting properties and makers.

2004-01-24 Thread J.Pietschmann
Peter B. West wrote:
 With my naive understanding of parsing as a two-stage process (lexemes
 - higher level constructs) I have been curious about earlier comments
 of yours about multi-stage parsing.  Can ANTLR do this sort of thing?
I'm not quite sure whether you mean by parsing as a two-stage
process the same as I do. In language specs, the formal description
is usually divided into a grammar level representing a Chomsky
level 2 context free grammar and a lexical level, described by simple
regular expressions (Chomsy level 0 IIRC). This is done both for
keeping the spec readable and for efficient implementation: a CNF
parser needs a stack, and while the Common Identifier can be
described in a CNF, it's more efficient to use regular expression
and implement the recognizer as a DFA, which doesn't shuffle
characters to and from the stack top like mad.
ANTLR provides for defining both the grammar and the lexical
level in one file, and it will generate appropriate Java
classes for the grammar parser as well as the token recognizer.
It's not as efficient as the famous lex+yacc utilities, but
this partly due to Java using Unicode, which would make the
lookup tables much much larger if generated the same way lex
does. Oh well: while yacc is a LARL(1), ANTLR can be configured
as LR(n), with a few LL(n) stuff mixed in. Not that this matters
much in practice, except for the number of concepts one has to
understand while writing a parser. And don't ask me right now
what the acronyms mean in detail, it's been 15 years since I
really had to know this.
 Given the amount of hacking I had to do to parse everything that could
 legally be thrown at me, I am very surprised that these are the only
 issues in HEAD parsing.
Well, one of the problems with the FO spec is that section 5.9
defines a grammar for property expressions, but this doesn't
give the whole picture for all XML attribute values in FO files.
There are also (mostly) whitespace separated lists for shorthands,
and the comma separated font family name list, where
a) whitespace is allowed around the commas and
b) quotes around the names may be omitted basically as long
 as there are no commas or whitespace in the name.
The latter means there may be unquoted sequences of characters
which has to be interpreted as a single token but are not NCNames.
It also means the in the font shorthand there may be whitespace
which is not a list element delimiter. I think this is valid:
 font=bold 12pt 'Times Roman' , serif
and it should be parsed as
 font-family='Times Roman' , serif
then the font family can be split. This is easy for humans but can
be quite tricky to get right for computers, given that the shorthand
list has a bunch of optional elements. Specifically
 font=bold small-caps italic 12pt/14pt 'Times Roman' , A+B,serif
should be valid too. At least, the font family is the last entry.
Note that suddenly a slash appears as delimiter between font size
and line height...
Another set of problems is token typing, the implicit type conversion
and the very implicit type specification for the properties. While
often harmless, it shows itself for the format property: the
spec says the expected type is a string, which means it should be
written as format='01'. Of course, people tend to write
format=01. While the parsed number could be cast back into a
string, unfortunately the leading zero is lost. The errata
amended 5.9 specifically for this use case that in case of an
error the original string representation of the property value
expression should be used to recover. Which temps me to use
Another famous case is hyphenation-char=-, which is by no
means a valid property expression. Additionally the restriction
to a string of length 1 (a char) isn't spelled out explicitly
All in all I have the feeling the spec tried to provide a
property specification system which would be powerful but still
easy to manage by hand, and they ended up with a system
containing as much or more unintended consequences as the C
preprocessor. Which, as everybody knows, lead to weirdness like
macro argument prescanning and 0xE-0x1 being a syntax error.
Well, the C preprocessor had at least a simple first
The maintenance branch tried to unify all cases into a single
framework, which quite predictably resulted in a complex and
somewhat messy code. It's also less efficient than it could be:
format=01 is (or would be) indeed parsed as expression, while
an optimized parser can take advantage of the lack of any string
operations and look for quoted strings and function calls only,
returning the trimmed XML attribute value otherwise.
Finally, bless the Mozilla and MySpell folks for the spell
checker... :-)

Re: Unnesting properties and makers.

2004-01-24 Thread J.Pietschmann
Finn Bock wrote:
...--I believe, we do (frequently?)
have more than one datatype per property, correct?
I remember two cases, but I can only find one at the moment: In 
Formally, there are a few more, for example initial-page-number. The
code treats them as Java String. This prevents, for example, writing
prop = this.propertyList.get(PR_BASELINE_SHIFT);
Some other properties which can have an enum or something numeric
as value:
 writing-mode (the auto enum)
 content-height and -width (auto and scale-to-fit)
 height, width and related stuff (auto, none)
 leader-pattern-width (use-font-metrics)
 page-heigth (auto, indefinite)
 table border precedences (force), 7.26.1
 letter-spacing (normal)
 word-spacing (normal)
 line-height (normal)
Does anybody know what space means for line-height???
I'm also missing the fformal definition of name for markers
(7.23.1 ff).
The text-align has a string as the second type beside enum tokens.
The text-shadow may be an enum (none), or a list of color values
with an optional triple of numerical values.
I should have added the latter as well as the text-decoration list
to the list of exceptions in the other post a few minutes ago.
Not to mention that nearly all properties may have the value inherit,
which is both defined as a keyword in the grammar and quite often
explicitely enumerated in the property description. And the clip
property (7.20.1) is yet another challenge to parse.

Re: missing Japanese character

2004-01-22 Thread J.Pietschmann
Siarhei Baidun wrote:
If you have more exact suggestion, share please.
Probably .../org/apache/fop/renderer/pdf/fonts/,

One of them is we are planning
to make porting on new FOP (from main branch)
Don't hold your breath here.


Re: Unnesting properties and makers.

2004-01-22 Thread J.Pietschmann
Finn Bock wrote:
I have not yet removed the properties.xsl file from CVS. I guess it 
should be removed since it isn't used anymore.
I think you could leave the file there for now. It should be
sufficient to  inactivate the related task in the buildfile
(for example putting it in an XML comment).
Does anyone know why we wrap the datatypes instances in a property 

Actually we should strive to use a proper parse tree for
property expressions:
1. Create a few classes for the symbols in the property
  expression grammar (section 5.9 of the spec). I think we need
 as terminals
 - AbsoluteNumeric
 - RelativeNumeric
 - Color (the #N thingy)
 - String (aka Literal)
 - NCName (everything else, basically, including enum tokens and
 and for the nonterminals
 - PropertyFunction
 - Some classes for the operators
2. Write a proper parser (maybe using ANTLR, at least for bootstrap)
 which produces a proper parse tree.
3. Add methods to the objects for resolving relative numeric values
 (percentages, em) and for evaluation.
4. Perhaps add constant folding to the parser.

Re: missing Japanese character

2004-01-21 Thread J.Pietschmann
Siarhei Baidun wrote:
What I'd like to know is a hint (or patch) what class(es) was(where)
modified in FOP 0.20.5 to fix this problem.
Do you use the same metrics file ion bothe cases? If so, it's
probably one of the mapping problems. The code should be either
in one of the files in the font subdirectory or in
You can try a CVS diff for a start.
Is there a specific reason why you can't simply upgrade? especially
the 0.20.4rc had a few nasty deficiencies.

Re: cvs commit: xml-fop/src/documentation/content/xdocs team.xml

2004-01-20 Thread J.Pietschmann
  removed former contributor
  section in favor of going back to giving credit within source files.
Uh, oh. That's not supposed to be a change anybody can make
on a whim.

Re: Properties question ( again? )

2004-01-19 Thread J.Pietschmann
Andreas L. Delmelle wrote:
line 157:CommonBackground bProps = propMgr.getBackgroundProps();
line 193:this.BackgroundColor =
I thought porpertyList had been retired in HEAD?

How should I see this? Is one of the two superfluous? Do they complement
each other? Shouldn't the latter be rewritten as :
this.BackgroundColor = bProps.backColor
I'd think so.


Re: Comments on new property maker implementation

2004-01-19 Thread J.Pietschmann
Finn Bock wrote:
I would guess that doing ~6 string compares to navigate the binary tree 
(with 148 color keywords) is slower than one string hash, ~1.2 int 
compares and one string compare. But I haven't measured it, so you might 
be well be right. Many keyword sets for other properties are much 
smaller and could perhaps benefit from a more suitable collection type.
I meant setup effort, although a binary tree will most likely do
additional memory management. You are right about the lookup. Just
for curiosity, where do you get the 1.2 int comparisions? A perfect
hash should not have collisions.
It might also be interesting how a trie or ternary tree (as used for
hyphenation patterns) would compare to hash maps for keywords (in
terms of setup costs, lookup costs and memory). I have doing a
study of various Java implementations on my todo list but didn't
quite get around to do this.

Re: Servlet Examples in HEAD v.s. 0.20.5

2004-01-18 Thread J.Pietschmann
John Austin wrote:
   (is Content-length: required for any reason other than placating
   Acrobat and that rich hermit who lives outside Redmond WA ?) 
Not really a FOP topic but anyway.
Setting content-length is considered good style, because it allows
browsers give feedback to the users how far the download proceeded.
This is especially useful for larger files on slow connections.
Of course, there is a tradeoff for dynamically generated content:
there wont be any feedback at all until the content is ready, and
if this is longer than the download time itself (now that everybody
has broadband :-) ), the user is still dissatisfied. Well, the
IEx architecture bug saves us from pondering the philosophical
2) Cache Templates objects for faster Transformations when XSLT
   files are to be re-used. The 'Java and XSLT' O'Reilly book
   has some interesting suggestions in this area.
The problem is to detect style sheet reuse without context information.

3) Using URL's for the fo= and xml=,xsl= parameters so we can use
   network resources as well as local files.
Doh, revert to +0. I'd like to do this, unfortunately, this is not
without drawbacks:
- People have to learn what an URI is. This seems to be much harder
 than expected, especially for file:-URLs.
- People will still insist to keep xml=foo.xml. This is still an
 URL (actually: a relative URL reference, which has to be resolved).
 We have to think hard what the base URL is in this case.

Re: Comments on new property maker implementation

2004-01-18 Thread J.Pietschmann
Glen Mazza wrote:
One thing that *does* stick out, however, is the 100
or so addKeyword() calls for genericColor
I'd like us to have a static array of these
values--i.e., something done compile-time, that
genericColor can just reference, so we don't have to
do this keyword initialization.  
Look up perfect hash code and the associated generators
on the internet, like gperf, a C++ implementation used by
gcc and a veriety of other compilers to provide a data
structure for mapping strings to something else in an
efficient way. Mind you, this would also benefit mapping
to FO and property names to their associated classes or
code numbers.

Re: [PATCH] abandoning code-generated Property.Maker

2004-01-18 Thread J.Pietschmann
Glen Mazza wrote:
/  )
I am not an XSLT guru--offhand, does anyone know of a
simple way to get the interfaces to appear
It ought to be
 xsl:apply-templates select...
xsl:sort select=name/
Substitute in the xsl:sort's select whatever is the sort key.


Re: Comments on new property maker implementation

2004-01-18 Thread J.Pietschmann
Finn Bock wrote:
You should perhaps also be aware that the values in a static array gets 
assigned to the array one element at a time. So
That's an unpleasant surprise. I was always under the impression
statically initialized data was stored along with the string
constants, like in C. This means a generated perfect has table
wouldn't have much of an advantage over, let's say, a simple
binary tree loaded with the values in proper order so that the
tree becomes automatically balanced (without rotations like
rb-trees do).
It would make sense, however, to properly initialitze initial size
values for the various hashmaps currently used.

Re: Justification in HEAD

2004-01-15 Thread J.Pietschmann
Chris Bowditch wrote:
I lean somewhat to the first strategy, because memory is usually more
of a problem then bare performance. 
This appears to be a contradiction, did you mean the last strategy?
Well, I meant the second (free memory as early as possible).


Re: Justification in HEAD

2004-01-14 Thread J.Pietschmann
Chris Bowditch wrote:
An interesting idea... but hasnt this already been done in more detail 
in TextLM.getNextBreakPoss? So why should the renderer have to do it 
again (although in less complexity)? I would rather have layout do this, 
otherwise this logic would have to live in every renderer.

There is a tradeoff between avoiding recomputing the word width and
carrying it around for probably some significant time.

I dont understand this bit fully. Are you saying its inefficient to 
carry around data items such as dAdjust, TSAdjust, etc? I would 
definitely say it is better than re-computing them in the renderer.
As you noticed, the TextLM.getNextBreakPoss examines the content
character by character for calculating the break possiblities,
thereby also keeping track of accumulated text width for the line.
The break possiblities, or something else, could be marked whether
they delimit adjustable space, refer to the x-offset or something
equivalent and the text snippet and all passed to the renderer,
thereby avoiding reparsing the line for adjustable space and
recalculating offsets/widths. The point is that this data could
as well be discarded after the line break is finalized. The most
convenient way to keep it around until rendering without copying
is as a list of small objects. Because redering doesn't start until
a full page is laid out, this would lock up a significant amount
memory. Whether it matters, compared to other problems, is yet to be
Summarizing: we have a bunch of strategies:
- Keep data associated with adjustable spaces from layout for rendering,
 thereby avoiding recalculation
- Discard the data as early as possible, thereby reducing peak memory
- Use something in between: copy into a compact representation, or
 discard part of the data.
I lean somewhat to the first strategy, because memory is usually more
of a problem then bare performance. I also have the gut feeling this
approach makes integration of leader expansion easier.
Of course, if someone implements several possiblities and runs benchmarks,
or even makes it a user choice, I wont object.

  1   2   3   4   5   6   7   >