Re: White space handling Wiki page

2005-11-07 Thread Manuel Mall
On Tue, 8 Nov 2005 04:40 am, Simon Pepping wrote:
> I have taken my time, but here is my reaction to the Wiki page on
> white space handling. In addition, I have written my own view on the
> XSL-FO spec's handling of white space in a Wiki page.

Simon,

your efforts are very much appreciated - at least by me. Your Wiki page 
presents white space handling at a different angle (paraphrased: 
Editors can modify the XML (adding spaces and linefeeds) and white 
space handling is mainly for dealing with those modifications). I think 
that is a very good perspective to take. 

>
> Step 2. Refinement: white-space-collapse
> 
>
> Issue 1. The spec intentionally addresses only XML white space,
> because only such white space is manipulated by editors to obtain
> pretty printing.

Point taken, although I have no experience with non western editors. Do 
they all use 0x20 for 'pretty printing'?

>
> Issue 2. The spec intentionally addresses only the collapse of white
> space around linefeed characters, because only such white space is
> manipulated by editors to obtain pretty printing. Even if linefeed
> characters indicate real line breaks and are preserved, it is
> possible that the editor has introduced sequences of XML white space
> characters for pretty printing.
>

OK

> Issue 3. White-space-collapse is formulated in terms of space
> characters which do not generate an area. That is similar to the
> space resolution rules, where space specifiers get a zero width.
> Since there is no merging of white space glyph areas into a single
> area, there is no contradiction with the condition for glyph merging
> in section 4.7.2. The space glyph area that does generate an area,
> determines the traits of that area.
>
Yes - but I my point was if someone writes:

and if ሴ and 䌡 are mergeable according to the rules of the 
script than we are not allowed to do so because they don't have 
matching traits. But if someone writes:

these would be removed / collapsed / deleted under the white space 
rules.

Here is a more extreme example:

Under white space collapse the whole fo:character with the border 
disappears. If you write:
 
at least the border is retained and if the space survives depends on if 
the sequence is at the beginning or end of a line or not.

Any way it is a bit academic as the spec is quite clear: if the Unicode 
value is U+0020 being it in a fo:character (during refinement) or a 
glyph area (during line building) it is subject to  white space 
handling independent of any other properties / traits defined on it.

> Step 3. Line building: white-space-treatment and
> suppress-at-linebreak
> =
>=
>
> I agree that the references to the refinement stage are probably
> editorial mistakes.
>
> Issue 1. As for white-space-collapse, the glyph areas are deleted,
> and glyph merging is not applicable.
>
I agree with that interpretation - just not sure it really captures well 
what a user may expect - see examples above.

> Issue 2. Here is a difference between FO 1.0 and 1.1. In 1.0 the flow
> objects were deleted at the refinement stage. Therefore they cannot
> contribute to line breaking. In 1.1 the glyph areas are deleted at
> the line building stage. Therefore they could contribute to line
> breaking. I do not think that this is intended, and they should not
> contribute to line breaking. This is in line with my opinion that the
> values preserve and ignore should not really be in the same property
> as suppression around linebreaks, and should be taken care of in the
> refinement stage.
>
Again I agree fully with you and the current implementation shows that 
issue. We deal with white-space-treatment twice once during refinement 
and once again during line building. Andreas commented on that as well. 
But I think that is how it has to be for the time being.

> Example 2
> =
>
> The space in "." is suppressed because it is at
> the start of the block. 
Interesting - I agree that this is the intention but you don't find that 
sentence in the spec. In 1.1 this is covered by the "deleting spaces at 
the beginning of a line" under white-space-treatment / line building. 
Again the discussion is probably academic - we all agree what the 
expected outcome is. If we can derive that outcome from the spec or not 
is a very interesting discussion but won't change what we will do.

> And "" does not generate 
> an empty line.  starts a new line, but that is not
> equivalent to a linefeed. When at the start of the nested fo:block
> there is no content in the line yet, it starts the same line. A
> similar thing happens in the case of "
",
> which was discussed in an email thread.
I assume you mean the discussion under linefeed-treatment="preserve". I 
am still confused about that because


 
will generate one linefeed or should this create also none?

>
> Example 3
> =
>
> Jörg asked the same in this email thr

Re: text-decoration problem?

2005-11-07 Thread Manuel Mall
Although it may be stating the obvious - our integrated tests don't 
catch regressions with respect to the renderers yet. Because our layout 
engine test suite is getting more comprehensive this may install a 
false sense of comfort when one does a change that everything is still 
fine.

Need to look into integrating Jeremias bitmap verifications into the 
test suite.

Manuel

On Tue, 8 Nov 2005 09:05 am, Manuel Mall wrote:
> On Tue, 8 Nov 2005 08:25 am, Sven wrote:
> > Good evening,
> > I will try to make it as short as I can: Using the latest trunk
> > (revision 331647) I am getting a non-interpretable error from my
> > Acrobat Reader (version 7.0.5), when compiling the attached fo
> > fragment. The error box tells me something about "too many
> > arguments available". I have pinpointed the error by using the
> > text-decoration attribute with an value other than "none". Funny
> > thing is, that this works some revisions before, but i am sorry not
> > being able to tell you, when things broke down.
> >
> > Thanks for your good work
> >
> > Sven
>
> Sven,
>
> thanks for your problem report. You are correct that this is a
> regression caused by some changes made recently. The problem should
> be fixed now (revision 331655).
>
> Regards
>
> Manuel
>
> > 
> > http://www.w3.org/1999/XSL/Format";>
> > 
> >  > page-height="29.7cm" master-name="default">
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  > text-align="end" font-size="10pt" font-family="Helvetica">
> > Einleitung
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  > font-family="Times">
> > Obwohl die agentenorientierte Softwareentwicklung
> > im vergangenen Jahrzehnt immer größeren Zuspruch gefunden hat,
> > findet sie bisher nur überwiegend im universitären Umfeld
> > Anwendung. Damit sie
> > 
> > Bestandteil
> > 
> > der Entwicklung von Unternehmensanwendungen werden
> > kann, müssten die Defizite bestehender Agentenplattformen beseitigt
> > werden. Diese Defizite bestehen in der schlechten
> > Administrierbarkeit und der häufig nur unzureichend unterstützten
> > Interoperabilität mit bestehenden Unternehmensanwendungen
> > [CoGrBKR02].  
> > 
> > 
> >
> >
> > ---
> >-- To unsubscribe, e-mail:
> > [EMAIL PROTECTED] For additional
> > commands, e-mail:
> > [EMAIL PROTECTED]
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]


Re: White space handling Wiki page

2005-11-07 Thread Simon Pepping
I have taken my time, but here is my reaction to the Wiki page on
white space handling. In addition, I have written my own view on the
XSL-FO spec's handling of white space in a Wiki page.

Step 2. Refinement: white-space-collapse


Issue 1. The spec intentionally addresses only XML white space,
because only such white space is manipulated by editors to obtain
pretty printing.

Issue 2. The spec intentionally addresses only the collapse of white
space around linefeed characters, because only such white space is
manipulated by editors to obtain pretty printing. Even if linefeed
characters indicate real line breaks and are preserved, it is possible
that the editor has introduced sequences of XML white space characters
for pretty printing.

Issue 3. White-space-collapse is formulated in terms of space
characters which do not generate an area. That is similar to the space
resolution rules, where space specifiers get a zero width. Since there
is no merging of white space glyph areas into a single area, there is
no contradiction with the condition for glyph merging in section
4.7.2. The space glyph area that does generate an area, determines the
traits of that area.

Step 3. Line building: white-space-treatment and suppress-at-linebreak
==

I agree that the references to the refinement stage are probably
editorial mistakes.

Issue 1. As for white-space-collapse, the glyph areas are deleted, and
glyph merging is not applicable.

Issue 2. Here is a difference between FO 1.0 and 1.1. In 1.0 the flow
objects were deleted at the refinement stage. Therefore they cannot
contribute to line breaking. In 1.1 the glyph areas are deleted at the
line building stage. Therefore they could contribute to line
breaking. I do not think that this is intended, and they should not
contribute to line breaking. This is in line with my opinion that the
values preserve and ignore should not really be in the same property
as suppression around linebreaks, and should be taken care of in the
refinement stage.

Example 2
=

The space in "." is suppressed because it is at
the start of the block. And "" does not generate
an empty line.  starts a new line, but that is not
equivalent to a linefeed. When at the start of the nested fo:block
there is no content in the line yet, it starts the same line. A
similar thing happens in the case of "
",
which was discussed in an email thread.

Example 3
=

Jörg asked the same in this email thread:
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&by=thread&from=561781,
entitled "Suppression of leading space".


  foo
   bar

.
..foo.
...bar

.
foo.
.bar

and also believes that two spaces remain.

As to the border of the inline on the next line, I think indeed that a
formatter should avoid it, as it may be considered as a bad layout
choice.

Processing Model 2
==

In steps 2 and 3 you apply the conditions of glyph area merging. I do
not agree with that, as I explained above.

In step 3 eligible characters are all characters with
suppress-at-line-break="true", by default only the space character.

Nowhere in the spec is a conversion of tabs and CRs to spaces
specified.

In example 3, why is the space before 'Green' not deleted? It directly
follows a line break (step 4b).

Regards, Simon

On Tue, Oct 25, 2005 at 04:57:41PM +0800, Manuel Mall wrote:
> Hi,
> 
> I haven't got any technical comments to the issues raised on the Wiki 
> page. Is this 'too hard' or 'too boring' or 'too messy' or what? The 
> problem is not going away. We currently don't do it right in some parts 
> (that is established) but I don't know overall what is right or wrong. 
> May be if I ask for comments on an issue by issue basis we get 
> somewhere?
> 

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Is getNextKnuthElements the right interface for inline LMs?

2005-11-07 Thread Manuel Mall
On Mon, 7 Nov 2005 10:39 pm, Luca Furini wrote:
> Manuel Mall wrote:
> > What I observed is that most of these issue cannot be solved by
> > looking at a single character at a time. They need context, very
> > often only one character, sometimes more (e.g. sequence of white
> > space). More importantly the context needed is not limited to the
> > fo they occur in. They all span across fos. This is were the
> > current LM structures and especially the getNextKnuthElement
> > interface really gets in the way of things. Basically one cannot
> > create the correct Knuth sequences without the context but the
> > context can come from everywhere (superior fo, subordinate fo, or
> > neighboring fo). So one needs look ahead and backtrack features
> > across all these boundaries and it feels extremely messy.
> >
> > It appears conceptually so much simpler to have only a single loop
> > interating over all the characters in a paragraph doing all the
> > character/glyph manipulation, word breaking (hyphenation), and line
> > breaking analysis and generation of the Knuth sequences in one
> > place. An example where this is currently done is the white space
> > handling during refinement. One loop at block level based on a
> > recursive char iterator that supports deletion and character
> > replacement does the job. Very simple and easy to understand. I
> > have something similar in mind for inline Knuth sequence
> > generation. Of course the iterator would not only return the
> > character but relevant formatting information for it as well, e.g.
> > the font so the width etc. can be calculated. The iterator may also
> > have to indicate start/end border/padding and conditional
> > border/padding elements.
>
> I think that there are two different "layers" that affect the
> generation of the elements: one is the "text layer" (or maybe
> semantic level), where we have the text and we can easily handle
> whitespace, recognize word boundaries, find hyphenation points,
> regardless of the actual fo (and its depth) where the text lives, and
> the "formatting layer" where we have the resolved values for the
> properties like font, size, borders, etc. These layers speak
> different languages, as one knows words and spaces and the other
> elements and attributes.
>
> At the moment, the getNextKnuthElements() method works at the
> formatting level: each LM knows the relevant properties but has a
> limited view of the text, whence the current difficulties.
>
> Your proposal is to work at the text level (correct me if I'm wrong),
> with the LineLM centralizing the handling of the text for a whole
> block. I wonder if, doing so, we would not find difficult to know the
> resolved property values applying to each piece of text.
>
> I'm not saying that whe don't need changes in the LM interactions;
> I'm just asking myself (and asking to you all, of course :-)) if it
> is really possible to have both breaking and element generation *in
> one place*.
>
> What if we had first a centralized control at the text level (the
> LineLM putting together all the text, finding words, normalizing
> spaces, performing hyphenation ...) and then a localized element
> generation (each LM, basing on what the LineLM did and using the
> local properties)?
>
> Something somewhat similar (but limited to single words) happens at
> the moment with the getChangedKnuthElements() method, which is called
> only after the LineLM has reconstructed a word, found its breaking
> points and told the inline LMs where the breaks are.
>
> Don't know if what I just wrote makes any sense; so, as I never tried
> to do what you suggest or what I just attempted to describe, I really
> look forward to see your code in action!
>
Luca,

yes, what you wrote makes sense and I am not at the coding stage yet. So 
don't hold your breath yet with respect to seeing new code from me - 
you may get blue in the face. Still trying to get my head around all 
the possible issues. I think your suggestion has quite a few merits. To 
rephrase it in my words: we do a text processing stage which precedes 
the getNextKnuthElements and (among other things) determines all the 
break possibilities. This list is then given to the LMs as part of the 
getNextKnuth call and the LMs can build the Knuth elements based on 
their local knowledge (properties) + the already calculated break 
possibilities. We may even be able to do that during the refinement 
(white space handling) loop thereby keeping repeated iterations over 
the text to a minimum.

I like the sound of this as it retains lots of what we have while 
addressing the need to analyse text across fo boundaries.

> Regards
>  Luca

Thanks

Manuel


Re: A few new features

2005-11-07 Thread Jeremias Maerki

On 07.11.2005 15:30:08 Chris Bowditch wrote:

> > - The command-line gets a new option: -out application/pdf myfile.pdf is
> > the generic way to create an output file. If someone created a WordXML
> > output handler and provided the right service resource file he could
> > specify "-out text/xml+msword out.xml". "-out list" lists all MIME types
> > that are available for output.
> 
> Are you saying that the -ps, -pdf, etc options are to be replaced by 
> -out application/pdf, etc? If so, then I don't like that at all. It's 
> much more convenient to just type -ps or -pdf.

I don't intend to remove any of the existing options. It's just an
addition.





Jeremias Maerki



Re: Is getNextKnuthElements the right interface for inline LMs?

2005-11-07 Thread Luca Furini

Manuel Mall wrote:

What I observed is that most of these issue cannot be solved by looking 
at a single character at a time. They need context, very often only one 
character, sometimes more (e.g. sequence of white space). More 
importantly the context needed is not limited to the fo they occur in. 
They all span across fos. This is were the current LM structures and 
especially the getNextKnuthElement interface really gets in the way of 
things. Basically one cannot create the correct Knuth sequences without 
the context but the context can come from everywhere (superior fo, 
subordinate fo, or neighboring fo). So one needs look ahead and 
backtrack features across all these boundaries and it feels extremely 
messy.


It appears conceptually so much simpler to have only a single loop 
interating over all the characters in a paragraph doing all the 
character/glyph manipulation, word breaking (hyphenation), and line 
breaking analysis and generation of the Knuth sequences in one place. An 
example where this is currently done is the white space handling during 
refinement. One loop at block level based on a recursive char iterator 
that supports deletion and character replacement does the job. Very 
simple and easy to understand. I have something similar in mind for 
inline Knuth sequence generation. Of course the iterator would not only 
return the character but relevant formatting information for it as well, 
e.g. the font so the width etc. can be calculated. The iterator may also 
have to indicate start/end border/padding and conditional border/padding 
elements.


I think that there are two different "layers" that affect the generation 
of the elements: one is the "text layer" (or maybe semantic level), where 
we have the text and we can easily handle whitespace, recognize word 
boundaries, find hyphenation points, regardless of the actual fo (and its 
depth) where the text lives, and the "formatting layer" where we have the 
resolved values for the properties like font, size, borders, etc. These 
layers speak different languages, as one knows words and spaces and the 
other elements and attributes.


At the moment, the getNextKnuthElements() method works at the formatting 
level: each LM knows the relevant properties but has a limited view of the 
text, whence the current difficulties.


Your proposal is to work at the text level (correct me if I'm wrong), with 
the LineLM centralizing the handling of the text for a whole block. I 
wonder if, doing so, we would not find difficult to know the resolved 
property values applying to each piece of text.


I'm not saying that whe don't need changes in the LM interactions; I'm 
just asking myself (and asking to you all, of course :-)) if it is really 
possible to have both breaking and element generation *in one place*.


What if we had first a centralized control at the text level (the LineLM 
putting together all the text, finding words, normalizing spaces, 
performing hyphenation ...) and then a localized element generation (each 
LM, basing on what the LineLM did and using the local properties)?


Something somewhat similar (but limited to single words) happens at the 
moment with the getChangedKnuthElements() method, which is called only 
after the LineLM has reconstructed a word, found its breaking points and 
told the inline LMs where the breaks are.


Don't know if what I just wrote makes any sense; so, as I never tried to 
do what you suggest or what I just attempted to describe, I really look 
forward to see your code in action!


Regards
Luca


Re: A few new features

2005-11-07 Thread Chris Bowditch

Jeremias Maerki wrote:


Last week, I've had some time to hack on my notebook. Fun stuff only. I've
finished a few things I started earlier and did some other things.
Here's a list of what I've done. Since some of them might be
controversial I want to give you a chance to object, just in case. So
here are the changes (almost) ready for committing on my notebook:


Hi Jeremias,

I have no comment for most of the items you mentioned. Generally they 
all sound good, except for one :)





- The command-line gets a new option: -out application/pdf myfile.pdf is
the generic way to create an output file. If someone created a WordXML
output handler and provided the right service resource file he could
specify "-out text/xml+msword out.xml". "-out list" lists all MIME types
that are available for output.


Are you saying that the -ps, -pdf, etc options are to be replaced by 
-out application/pdf, etc? If so, then I don't like that at all. It's 
much more convenient to just type -ps or -pdf.




Chris




Re: Is getNextKnuthElements the right interface for inline LMs?

2005-11-07 Thread Jeremias Maerki

On 07.11.2005 08:24:14 Manuel Mall wrote:

> Of course that would be quite a change internally although limited to 
> inline LMs and not affecting any block level operations. The way to do 
> this would be a branch in svn. But before I embark on such an endeavour 
> I'll like to seek some feedback on the list. Anyone aware of serious 
> problems with such an approach?

No.

> Has it been tried before and failed for example?

We had to change a few things during the transition to the Knuth
approach. Sometimes, changes are necessary and it makes no sense to
stubbornly stick to what is already there. 

> Those who designed the current getNextKnuth approach may have 
> arguments why changing it for inline LMs is a bad idea?

I have none. You seem to have good arguments for changing the interface.
Still, care should be taken that the LMs stay as uniform as possible so
it's possible to add layout managers for custom elements and that
non-character content is handled well and without too much custom logic
because the changed approach focuses strongly on text.

> Any other views / concerns?

The above said, it should be noted that I haven't dived, yet, into the
Unicode stuff you've been discussing lately. I'm very happy about the
flurry of activity in this area. It looked like a good discussion. I
hope you will excuse me if I don't participate too much there right now.


Jeremias Maerki



A few new features

2005-11-07 Thread Jeremias Maerki
Last week, I've had some time to hack on my notebook. Fun stuff only. I've
finished a few things I started earlier and did some other things.
Here's a list of what I've done. Since some of them might be
controversial I want to give you a chance to object, just in case. So
here are the changes (almost) ready for committing on my notebook:

- Two new constructors for Fop.java: Fop(String) and Fop(String,
FOUserAgent) where String is a MIME type.

- org.apache.fop.apps.MimeConstants with a comprehensive list of MIME
types used in FOP.

- Non-standard, FOP-specific MIME types changed to a uniform pattern:
application/X-fop-awt-preview, application/X-fop-print,
application/X-fop-areatree

- RendererFactory now supports manual registration and dynamic discovery
of Renderers and FOEventHandlers by their MIME types. Instantiation is
done using MIME types everywhere.

- The RENDER_* constants are mapped to MIME types in Fop.java. I'd like
to remove them but left them where they are for the moment. I'd also
like to remove the "implements Constants" from Fop.java. But that's
nothing new. :-)

- RendererFactory is now an instantiable class whose reference is held
by FOUserAgent just like it is done for the XMLHandlers.

- Renderers and FOEventHandlers now each have a *Maker class which is
also a kind of factory class which is used to register a
Renderer/FOEventHandler and additionally serves to provide additional
information about the thing, such as which MIME types it supports and if
the implementation requires an OutputStream.

- The command-line gets a new option: -out application/pdf myfile.pdf is
the generic way to create an output file. If someone created a WordXML
output handler and provided the right service resource file he could
specify "-out text/xml+msword out.xml". "-out list" lists all MIME types
that are available for output.

- To make things a little more consistent and error reporting easier,
I've changed FNode so each FNode can return the namespace URI it belongs
to (getNamespaceURI()). Furthermore, it can return the normally used
namespace prefix (getNormalNamespacePrefix(), "fo" for XSL-FO) and I've
added methods to build the fully qualified name from a local name. For
this I've changed getName() to getLocalName() for all descendants of
FONode so it is defined to return only the local name and not (in some
cases) the fully qualified name. The whole stuff now feels a lot cleaner.

- I've started extended support to handle alternative forms of painting
graphics. For example, Barcode4J supports SVG, EPS, bitmaps and Java2D
as output targets. For PostScript, it's better too use EPS directly. In
PDF Java2D could be used instead of SVG to avoid the slow round-trip
through Batik. RTF could use Bitmaps. I've started a Graphics2DAdapter
which renderers provide if they can provide a Graphics2D instance for
painting. XMLHandler can query a Graphics2DAdapter and use that to paint
the graphic. The image is passed to the Graphics2DAdapter as a
Graphics2DImagePainter instance which essentially provides a
paint(Graphics2D, Rectangle) method and a getImageSize() method.

While implementing the last point I've had to realize that while this is
a step forward, this is not enough in the long run. I believe we need to
also refactor the whole image package again to handle a few additional
problems. An example: Our JPEG support is currently restricted to PDF
and PS renderers where the JPEG can be embedded in undecoded form. The
Java2D renderer descendants currently don't support JPEG at all because
they can't get a decoded JPEG image. What I would like to do is
introduce something similar to the concept already used by the Java
Printing System (JPS): The DocFlavor, an object to describe the format
(SVG, JPEG, MathML etc.) and manifestation (byte[], DOM etc.). The
renderers say what formats they support (with desirability indicators)
and the image classes will support providing the images in different
flavors, as needed. In between, special converters can convert from one
format to another, like converting a MathML DOM to a bitmap image. I'm
not going into more detail right now. I'll document everything on the
Wiki. I can only say that I had to realize that I will basically need to
recreate Nicola Ken Barozzi's Morphos idea again. That's going to be
interesting and a lot of fun. :-)

Ok, so if anybody is against any of the above points or needs additional
information, please tell me.

Jeremias Maerki



Test post from Gmane

2005-11-07 Thread Jeremias Maerki
This is a test post with an unsubscribed email address from the web interface of
http://www.gmane.org. If it works we should publish some information about this
on our website so people who hate mailing lists still have a way to post
messages on our lists without having to subscribe.

Cheers,
Jeremias Maerki



Re: Unicode compliant Line Breaking

2005-11-07 Thread Jeremias Maerki
1. +1
2. +1
3.b) +1 for the separatable parts although c) is also ok for now.

+1 to try to find synergies with the code in Batik.

If I were you I'd create a branch and put your stuff in there. It's
easier for everyone to follow and to help (wishful thinking).

On 31.10.2005 08:25:12 Manuel Mall wrote:
> In a previous post Joerg pointed to the Unicode Standard Annex #14 on 
> Line Breaking (http://www.unicode.org/reports/tr14/) and his initial 
> implementation: http://people.apache.org/~pietsch/linebreak.tar.gz.
> 
> I had since a closer look at both UAX#14 and Joerg's code. Because I 
> liked what I saw I went about adapting Joerg's code it to Unicode 4.1 
> and added fairly extensive JUnit test cases to it mainly because it 
> really helps to go through the various different cases mentioned in the 
> spec in some structured fashion.
> 
> The results are now available for public inspection: 
> http://people.apache.org/~manuel/fop/linebreak.tar.gz
> 
> 1. I would like to propose that Unicode conformant line breaking be 
> integrated into FOP trunk because it:
> a) Moves FOP more towards being a universal formatter and not just a 
> formatter for western languages
> b) Moves FOP more towards becoming a high quality typesetting system 
> (something that was really started by integrating Knuth style breaking)
> The reason I think this needs to be voted on is because Unicode line 
> breaking will in subtle ways change the current line breaking behaviour 
> and therefore constitutes a (significant) change in FOPs overall 
> rendering.
> 
> 2. I would also like to propose that the Unicode conformant line 
> breaking be implemented using our own pair-table based implementation 
> and not using Java's line breaker, because:
> a) It gives us full control and allows FOP to follow the Unicode 
> standard (and its updates and erratas) closely and therefore keep FOPs 
> Unicode compliance level independent of the Java version.
> b) It allows us to tailor the algorithm to match the needs of XSL-FO and 
> FOP.
> c) It allows us to provide user customisation features (down the track) 
> not available through using the Java APIs.
> 
> Of course there are downsides, like:
> a) Are we falling for the 'not invented here' syndrome?
> b) Duplicating code which is already in the Java base system
> c) Increasing the memory footprint of FOP
> 
> 3. Assuming we get enough +1 for the above proposals the first item to 
> decide after that would be: Where should the code live?
> a) Joerg would like to see it in Jakarta Commons but hasn't got the time 
> to start the project. 
> b) Jeremias suggested XMLGraphics Commons. 
> c) Personally I think it is too early to factor it out. More experience 
> with its design and use cases should be gathered before making it 
> standalone and at this point in time it really only are 2 core Java 
> classes. I would like to suggest that it initially lives under FOP in 
> something like org.apache.fop.text. Should the need and energy levels 
> (= developer enthusiasm) become available later to make this into an 
> Jakarta Commons or XMLGraphics Commons project so be it.
> 
> Assuming now that this will be agreed as well the next step would be the 
> more detailed design of the integration. But this is well beyond the 
> scope of this e-mail as there are some tricky issues involved and they 
> probably need to be tackled in conjunction with the white space 
> handling issues. Many of the problems are related to our LayoutManager 
> structures which create barriers when it comes to the need to process 
> character sequences across those boundaries as is the case for both 
> line breaking and white space handling. Add to that the design of the 
> different Knuth sequences required to model the different break cases 
> in conjunction with conditional border/padding and white space removal 
> around line breaking and different types of line justifications and 
> there is some real work ahead.
> 
> Cheers
> 
> Manuel
> 
> Should add my votes:
> 
> 1.) +1
> 2.) +1
> 3.c) +1



Jeremias Maerki