Re: Borders in page regions

2015-11-11 Thread Luca Furini
Seifeddine Dridi wrote:

> Does anybody know why it isn’t allowed to set borders in page regions ? The
> XSL specs says that border-width and padding values must be 0, but I don’t
> understand why it is enforcing this restriction, RenderX for instance allows
> borders in page regions.

You can activate the "relaxed validation mode", which allows borders
and paddings on regions; it would still give you a warning (instead of
a validation error), but it will render them fine.

Relaxed validation is activated by using the -r option from the
command line, or by setting
false in the configuration
file.

Hope this helps

Luca


Request for developer's powers on JIRA

2015-02-14 Thread Luca Furini
Now that I'm back in harness, thanks to a lot of patient people, I
only need the magical power to work on JIRA issues in order to be a
(somewhat) useful committer.

Glen, as I'm told you are FOP's JIRA administrator, could you please
give the necessary privileges to my lfurini JIRA account? Do you
need any additional info?

Bye
Luca


[jira] [Resolved] (FOP-2348) [PATCH] PDF File Attachment Extension is broken

2015-02-14 Thread Luca Furini (JIRA)

 [ 
https://issues.apache.org/jira/browse/FOP-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Furini resolved FOP-2348.
--
   Resolution: Fixed
Fix Version/s: trunk

Patch applied in revision r1655099

Revision  r1659776 added an automatic testcase checking for this feature, so as 
to avoid regressions in the future.

 [PATCH] PDF File Attachment Extension is broken
 ---

 Key: FOP-2348
 URL: https://issues.apache.org/jira/browse/FOP-2348
 Project: Fop
  Issue Type: Bug
  Components: renderer/pdf
Affects Versions: trunk
Reporter: Matthias Reischenbacher
Assignee: Luca Furini
 Fix For: trunk

 Attachments: 2348-testcase.zip, 2348.patch


 PDF File attachments are broken in latest trunk. I didn't investigate in 
 detail, but I think its since rev 1537948 or 1522934. When generating a PDF 
 with file attachments a NullPointerException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException

2015-02-14 Thread Luca Furini (JIRA)

 [ 
https://issues.apache.org/jira/browse/FOP-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Furini resolved FOP-2441.
--
   Resolution: Duplicate
Fix Version/s: trunk
 Assignee: Luca Furini

As Thanasis Giannimaras commented, this is a duplicate of another bug, now 
resolved.

 pdf:embedded-file extension is broken, gives NullPointerException
 -

 Key: FOP-2441
 URL: https://issues.apache.org/jira/browse/FOP-2441
 Project: Fop
  Issue Type: Bug
  Components: renderer/pdf
Affects Versions: trunk
Reporter: Luca Furini
Assignee: Luca Furini
Priority: Minor
 Fix For: trunk

 Attachments: change.diff, test_attachment.fo


 The extension property pdf:embedded-file (to attach files to the pdf) is not 
 working, and generates a NullPointerException.
 I noticed the problem while trying to write an answer to this StackOverflow 
 question: 
 http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop
  
 (the question is about a different problem, but while testing on fop-trunk I 
 noticed this bug I'm reporting).
 Looking at the revision history, I think the implementation of this extension 
  has been broken since revision [1522934].
 I'm going to attach a simple fo file showing the problem, together with a 
 proposed patch.
 I have been a fop committer for some time, followed by a looong period of 
 just lurking the mailing list; I tried to commit the changes myself, but I 
 guess my long inactivity period has caused the revocation of my commit 
 privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Updated] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException

2015-01-24 Thread Luca Furini
Luis Bernardo wrote:

 I am under the impression that committership rights are never revoked but I
 could be wrong. Are you sure that you can log in to your Apache account?
 Maybe a year ago or so Apache forced a change in passwords. Did you change
 your password when that happened?

Yes, I remember changing the password when requested.

When I try to commit I get a 403 forbidden error message after being
requested username and password (by comparison, if I enter a wrong
username / password I keep being asked to enter them correctly).

I think my apache account still exists as I can log in to
https://id.apache.org, and I'm in the commiters list at
http://people.apache.org/committer-index.html#lfurini (although I'm
not assigned to any svn projects).

On the other hand I have been inactive for several years, so I
wouldn't be surprised or offended if someone / an authomatic procedure
revoked my powers ...

So ... does anyone has any ideas? :-)

Bye
Luca


[jira] [Updated] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException

2015-01-23 Thread Luca Furini (JIRA)

 [ 
https://issues.apache.org/jira/browse/FOP-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Furini updated FOP-2441:
-
Attachment: test_attachment.fo

Simple fo file to reproduce the error

 pdf:embedded-file extension is broken, gives NullPointerException
 -

 Key: FOP-2441
 URL: https://issues.apache.org/jira/browse/FOP-2441
 Project: Fop
  Issue Type: Bug
  Components: renderer/pdf
Affects Versions: trunk
Reporter: Luca Furini
Priority: Minor
 Attachments: test_attachment.fo


 The extension property pdf:embedded-file (to attach files to the pdf) is not 
 working, and generates a NullPointerException.
 I noticed the problem while trying to write an answer to this StackOverflow 
 question: 
 http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop
  
 (the question is about a different problem, but while testing on fop-trunk I 
 noticed this bug I'm reporting).
 Looking at the revision history, I think the implementation of this extension 
  has been broken since revision [1522934].
 I'm going to attach a simple fo file showing the problem, together with a 
 proposed patch.
 I have been a fop committer for some time, followed by a looong period of 
 just lurking the mailing list; I tried to commit the changes myself, but I 
 guess my long inactivity period has caused the revocation of my commit 
 privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException

2015-01-23 Thread Luca Furini (JIRA)

 [ 
https://issues.apache.org/jira/browse/FOP-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Furini updated FOP-2441:
-
Attachment: change.diff

Proposed patch

 pdf:embedded-file extension is broken, gives NullPointerException
 -

 Key: FOP-2441
 URL: https://issues.apache.org/jira/browse/FOP-2441
 Project: Fop
  Issue Type: Bug
  Components: renderer/pdf
Affects Versions: trunk
Reporter: Luca Furini
Priority: Minor
 Attachments: change.diff, test_attachment.fo


 The extension property pdf:embedded-file (to attach files to the pdf) is not 
 working, and generates a NullPointerException.
 I noticed the problem while trying to write an answer to this StackOverflow 
 question: 
 http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop
  
 (the question is about a different problem, but while testing on fop-trunk I 
 noticed this bug I'm reporting).
 Looking at the revision history, I think the implementation of this extension 
  has been broken since revision [1522934].
 I'm going to attach a simple fo file showing the problem, together with a 
 proposed patch.
 I have been a fop committer for some time, followed by a looong period of 
 just lurking the mailing list; I tried to commit the changes myself, but I 
 guess my long inactivity period has caused the revocation of my commit 
 privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException

2015-01-23 Thread Luca Furini (JIRA)
Luca Furini created FOP-2441:


 Summary: pdf:embedded-file extension is broken, gives 
NullPointerException
 Key: FOP-2441
 URL: https://issues.apache.org/jira/browse/FOP-2441
 Project: Fop
  Issue Type: Bug
  Components: renderer/pdf
Affects Versions: trunk
Reporter: Luca Furini
Priority: Minor


The extension property pdf:embedded-file (to attach files to the pdf) is not 
working, and generates a NullPointerException.

I noticed the problem while trying to write an answer to this StackOverflow 
question: 
http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop
 
(the question is about a different problem, but while testing on fop-trunk I 
noticed this bug I'm reporting).

Looking at the revision history, I think the implementation of this extension  
has been broken since revision [1522934].

I'm going to attach a simple fo file showing the problem, together with a 
proposed patch.

I have been a fop committer for some time, followed by a looong period of just 
lurking the mailing list; I tried to commit the changes myself, but I guess my 
long inactivity period has caused the revocation of my commit privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Absolute-positioned block-containers using left and bottom

2008-07-07 Thread Luca Furini
On Fri, Jul 4, 2008 at 6:09 PM, Andreas Delmelle
[EMAIL PROTECTED] wrote:

 Now, I'm wondering... In theory, it should not be too difficult to get at
 this info, since ultimately it is also needed when computing 'top' or 'left'
 if they are specified as a percentage. In that case, the value is obtained
 through AbstractBaseLayoutManager.getBaseLength() - getReferenceAreaIPD()
 or getReferenceAreaBPD().

Yeah, thank you, that was it!
Right under my nose, yet I could not see it ... :-)

Now everything should be ok in revision 674489.

Regards
Luca


Re: Absolute-positioned block-containers using left and bottom

2008-07-04 Thread Luca Furini
On Wed, Jul 2, 2008 at 8:24 PM, Andreas Delmelle
[EMAIL PROTECTED] wrote:

 If you have the area's own dimensions, and the complement properties
 (bottom-right), is that not enough?

 For the renderer:
 - top = (bottom - area-bpd - borders - padding))
 - left = (right - area-bpd - borders - padding))

Bottom is the distance from the nearest ancestor reference area
*bottom* edge, not from the top one (the same for left), so we need to
know the reference bpd and ipd.
So, if I read the spec right, a 10pt bottom position set on the same
absolutely positioned block-c would translated into different
top-positions according to the the reference bpd (= the reference
bottom edge).

 It seems so simple... Am I missing something?

*At least* one of us is missing something ;-)

 (It's hot here, too! ;-))

We probably need a summer team located in the southern hemisphere!

Regards
   Luca


Absolute-positioned block-containers using left and bottom

2008-07-03 Thread Luca Furini
(it's still me, I just subscribed with the gmail account I use more
frequently, to avoid problems of messages not reaching the list ...)

On Wed, Jul 2, 2008 at 8:24 PM, Andreas Delmelle
[EMAIL PROTECTED] wrote:

 If you have the area's own dimensions, and the complement properties
 (bottom-right), is that not enough?

 For the renderer:
 - top = (bottom - area-bpd - borders - padding))
 - left = (right - area-bpd - borders - padding))

Bottom is the distance from the nearest ancestor reference area
*bottom* edge, not from the top one (the same for left), so we need to
know the reference bpd and ipd.
If I read the spec right, a 10pt bottom position set on the same
absolutely positioned block-container would be translated into
different top-positions according to the the reference bpd (= the
reference bottom edge).

 It seems so simple... Am I missing something?

*At least* one of us is missing something ;-)

 (It's hot here, too! ;-))

We probably need a summer team located in the southern hemisphere!

Regards
  Luca


Re: Absolute-positioned block-containers using left and bottom

2008-07-02 Thread Luca Furini
(I'm re-posting this message as I sent it yesterday and still cannot see 
it in the list archives, I hope I'm not duplicating it unnecessarily)


On Mon, Jun 23, 2008 at 5:12 PM, Luca Furini [EMAIL PROTECTED] wrote:


If there is a block-container with both width and height set, its
position can be correctly controlled using top and left (and indeed
there are many testcases checking that) but bottom and right do not
have any visible effect.


I've solved the bug for simple situations, but the solution is not
nearly general enough to be committed.

The point is: right and bottom distances need to be respectively
translated into x- and y-offset at some time, and in doing this we
must know the ipd and bpd of the nearest ancestor reference area, as,
for example,
  x-offset = reference-bpd - object-bpd - right-distance

My first idea was to set the offsets at the LM level, when creating
areas, so that there would be no changes at all for the renderers, but
I failed to find a way to obtain the nearest ancestor reference area,
as areas have no parent pointer (and I couldn't even think of a nice
way to find the appropriate region reference ...).

So, I'm almost convinced that the bottom- and right- distances should
be preserved in the area tree, and translated into offset during the
rendering, where it would be possible to keep updated a
nearestReferenceArea pointer just like current*PPosition is.

Comments, suggestions, warnings would be most welcome, as I fear the
heat of these days is making me insane! :-)

Regards
   Luca


Absolute-positioned block-containers using left and bottom

2008-06-23 Thread Luca Furini
While playing a bit with absolute positioned block container, I think
I stumbled into a little bug.

If there is a block-container with both width and height set, its
position can be correctly controlled using top and left (and indeed
there are many testcases checking that) but bottom and right do not
have any visible effect.

I'm attaching a simple file, whose expected output would show four
colored block-container adjacently placed 2x2 (I tried another
formatter, and it behaves as expected).

I did not investigate any deeper, but I noticed that in the area tree
xml we use only two attributes (top-position and left-position), and
they are 0 when the corresponding block-container has @bottom /
@right.

Tomorrow I'll work on this, obviously if no one arrives first or
convinces me that the right output is what we already get :-)

Regards
Luca
?xml version=1.0 encoding=UTF-8?
fo:root xmlns:fo=http://www.w3.org/1999/XSL/Format;
	fo:layout-master-set
		fo:simple-page-master master-name=simple page-width=6in page-height=5in margin=1in
			fo:region-body/
		/fo:simple-page-master
	/fo:layout-master-set
	fo:page-sequence master-reference=simple
		fo:flow flow-name=xsl-region-body
			fo:block font-size=48ptposition/fo:block
			fo:block font-size=48pt text-align=rightfo:inlineNOT/fo:inline ok!/fo:block
			fo:block-container absolute-position=absolute width=51pt height=30pt background-color=red top=57pt left=109pt
fo:block/
			/fo:block-container
			fo:block-container absolute-position=absolute width=51pt height=30pt background-color=yellow top=57pt right=77pt
fo:block/
			/fo:block-container
			fo:block-container absolute-position=absolute width=51pt height=30pt background-color=blue bottom=99pt left=109pt
fo:block/
			/fo:block-container
			fo:block-container absolute-position=absolute width=51pt height=30pt background-color=green bottom=99pt right=77pt
fo:block/
			/fo:block-container
		/fo:flow
	/fo:page-sequence
/fo:root


expectedOutput.pdf
Description: Adobe PDF document


fopOutput.pdf
Description: Adobe PDF document


Re: Border and padding on page regions

2008-06-20 Thread Luca Furini
On Thu, Jun 19, 2008 at 3:45 PM, Jeremias Maerki [EMAIL PROTECTED] wrote:

 There's both in FOP. block-container has the border on the viewport.
 table-cell has it on the reference area (table-cell doesn't generate a
 viewport). But I fear we might actually be wrong about having the border
 and padding on the viewport area.

Ok, so the region reference is the right place for borders and
padding; a posteriori it seems reasonable: the viewport defines the
window, the reference area starts defining what we see ... (but I
could easily convince myself of the other option too :-) )

 It's interesting that we treat background and borders together in the
 renderers although 4.9.4 http://www.w3.org/TR/xsl11/#rend-border makes a
 distinction where the background is to be applied. But we don't support
 background-attachment so that didn't get noticed that way.

I could split the matod
AbstractPathOrientedRenderer.drawBackAndBorders() in two
drawBackground() / drawBorders() methods, as the background trait is
still in the viewport while borders and padding will be in the
reference area.

Thanks for the feedback (to Andreas too)

Regards
Luca


Border and padding on page regions

2008-06-19 Thread Luca Furini
Some time ago (well, almost 2 years!) we spoke about the possibility
to allow users to define borders and padding for the page regions [1].

This week I finally found some time to do it, so I have it working on
my local copy ... but then I was struck by a dilemma: the additional
traits about borders and padding should be set for the region viewport
(class RegionViewport) of for the region reference area
(RegionReference)?

This sentence in the specs (4.2.2. common traits) made me decide to
put them in the RegionRefernce, as the padding results in a reduction
of the content rectangle bpd / ipd:
Only a reference-area may have a block-progression-direction which is
different from that of its parent.

But (6.4.14. fo:region-body), where it says that padding and borders
should be 0, also says that it's the region viewport that has margins
(so it would have paddings and borders too, if allowed).
Moreover, the code already present in
AbstractPathOrientedRendere.handleRegionTraits() (not to mention
Murphy's laws!) seem to suggest that these traits should belong to the
region viewport.

So, a couple of questions:
- do we still think that supporting borders and padding on regions
when relaxed validation is on would be something good (or, at least,
not bad)?
- is RegionViewport the right place for the additional traits?

Regards
 Luca


[1] 
http://www.nabble.com/Re%3A-svn-commit%3A-r225580xmlgraphics-fop-trunk-test-layoutengine-testcases-page-master4.xml-to511937.html#a511937


Re: Border and padding on page regions

2008-06-19 Thread Luca Furini
On Thu, Jun 19, 2008 at 1:26 PM, Luca Furini [EMAIL PROTECTED] wrote:

 Only a reference-area may have a block-progression-direction which is
 different from that of its parent.

Ops, I realize only now that it says direction and not dimension :-)

Ok, so I think this definitely means that the traits should be in the
region viewport ...

Sorry for the noise!

Regards
Luca


Re: svn commit: r668177 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/layoutmgr/ test/layoutengine/standard-testcases/

2008-06-16 Thread Luca Furini
On Mon, Jun 16, 2008 at 4:52 PM,  [EMAIL PROTECTED] wrote:

 Fixing the PageBreakingAlgorithm, replacing calls to getLineWidth() with 
 getLineWidth(int)
 so  as to take into account each page's real height.
 This fixes the positioning of footnotes when the page bpd is not the same for 
 all pages.

This was a little nasty bug I stumbled upon a few days ago, and it
took me some time to track back where the problem was ...

The PageBreakingAlgorithm, in particular the computeDifference()
method, had some calls to getLineWidth() without parameters and some
other with an int parameter indicating the page.
When the page bpd changes from page to page, the two methods returned
different values, with the effect that the algorithm first believed it
could place a whole footnote in the page, and then found out that this
led to an overflow.

In order to avoid similar problems, the parameter-less getLineWidth()
method could maybe be deprecated?

Regards
Luca


Re: Checking: difference between negative stretch and positive shrink?

2007-09-17 Thread Luca Furini

Andreas L Delmelle wrote:

Just wondering about some KnuthSequences for spaces I noticed during 
a debug-session:


glue w=0 stretch=10008 shrink=0
penalty w=0 p=0
glue w=3336 stretch=-10008 shrink=0

What does it mean that the latter glue can be stretched by a negative 
amount?

Why not:
glue w=3336 stretch=0 shrink=10008

Is there a difference as to how the algorithm treats these?


Negative stretch is not the same as a positive shrink (and vice-versa): a 
negative stretch is used to cancel (or diminish) a positive one provided 
by some other elements: for each possible break point, however, the 
overall stretch / shrink should always be = 0.


The meaning of the mini-sequence above is:
- if there is a break at the penalty element, there is some stretch for
  the line ending there
- otherwise, the overall stretch is zero

This is with unjustified text to give each line the same amount of 
stretch, so that the algorithm should build lines with similar length 
(while in justified text a line with many spaces and few letters could be 
stretched a lot).


HTH

Luca




Re: svn commit: r557347 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/layoutmgr/inline/ test/layoutengine/ test/layoutengine/stan

2007-07-25 Thread Luca Furini
(I see that Jeremias agrees with Andreas about how to interpret the nested 
keeps, so I reply just once)


Andreas L. Delmelle wrote:


In very rough terms, the logic behind it would be:
if a given break #1 has a plain adjustment ratio of 3 and a governing 
keep of auto,
and the next break #2, regardless of its adjustment ratio, would 
violate a keep-constraint with a higher value,
= apply a 'correction factor' to the demerits of break #1 so that 
the influence of the plain adjustment ratio is significantly reduced, 
and it will be considered a good break in spite of its being way 
too short.


If the effective demerit computation of break #2 takes into account 
the fact that that break itself would violate a keep (other than 
auto), its demerits would in turn be artificially increased, so if 
eventually presented with a choice, the algorithm would prefer the 
first break over the second.


No idea if this would be feasible in practice though...

double effectiveRatio = (adjustmentRatio * keepRatio);

where keepRatio is a double value based on the keep-values, that 
would either leave the adjustmentRatio as is (= 1), or increase/ 
reduce the influence of the plain value.


Seems a good idea!

Maybe we could *add* something to adjustmentRatio, so we could be quite 
sure that a break violating a keep suddenly becomes quite ugly even if its 
original ratio would be  1.


I am trying to imagine if we have to handle in a different way breaks 
violating a keep and having a ratio  0 ... maybe these are the ones that 
really needs to be depracated.


Satisfying a keep condition without overflowing involves the creation of 
*short* lines, as what we want to avoid is violating a keep just to fill a 
line better. So:

- probably there is no need to penalize a break violating a keep if it has
  a high ratio: successive breaks will surely be better, whether they
  still violate a keep or not

- a long break, on the other hand, should be used only if there isn't
  any other alternative (for example, because the content of the keep is a
  single word longer than a line), so they should not enter the record
  structure, but only saved as a lastTooLong solution.

So, what we need could be just correct the violating breaks having a 
negative ratio, so that it becomes  -1 and would not become an active 
node (unless there is a restart, in which case it's ok).


... but I'm basically writing down things as I think, so I could be 
missing some important point!


One last quick note concerning nesting: what if the inner inline 
had a *lower* force? feasible breaks in it should be first given a 
high penalty (to see if we can put everything together) and then a 
lower one (so the penalties should hold a nested chain of force 
values)?


Indeed, easy enough if the parent inline has a keep of always
[...]
One might argue that by violating the inner keep, we would also 
violate the outer keep, and as such it makes no difference where the 
violation occurs... On the other hand, I wonder whether the use-case 
for such combinations would not precisely be to indicate to the 
formatter: If violating a keep is unavoidable, then this is where it 
should preferably happen...


Ok, you are for the stronger keep wins interpretation, and now I'm 
convinced too.


At first, my doubt was that in this way we have an explicitly specified 
value that gets overridden by another one specified on an *ancestor* 
node. This seemed to me a strange exception to the general principle that 
an explicit value overrides an inherited one, until I realized this is 
quite similar to the space resolution rules, where the space set for an 
object could win against one set in one of its descendants, because of its 
higher precedence.


So, now I think that an inner keep with lesser force should not have any 
effect.


   Luca





Re: svn commit: r557347 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/layoutmgr/inline/ test/layoutengine/ test/layoutengine/stan

2007-07-20 Thread Luca Furini

Andreas L. Delmelle wrote:

That's one detail I was still unsure about. Only if the other factors 
remain identical, the algorithm would prefer a break at penalty 50 
over one at penalty 100... but if the value of the penalty is only of 
marginal influence as you suggest, then this would indeed not be enough.


I made some quick computation to see how the demerits change according to 
the penalty value and the adjustment ratio (see the table in the attached 
pdf).


It seems that the penalty value is highly relevant as long as the 
adjustment ratio varies between -1 and 1 (i.e. we are choosing among 
breaks that are quite good), but rapidly becomes less and less important 
as the adjustment ratio grows.


For example, in order to let the algorithm prefer a break with ratio 2 
(and no penalty value) to a different one with ratio 1.5, the penalty 
value for the second penalty should be at least 500; and no penalty value 
can make the algorithm prefer a break with ratio = 3 to another one with 
ratio = 2.5 (as we use 1000 as infinite penalty).



In the example I posted earlier:

fo:block
   Some text with auto keep-constraint
   fo:inline keep-together.within-line=100
   Some text with a keep.within-line constraint of 100
 fo:inline keep-together.within-line=500
   keep.within-line=500
 /fo:inline
   Some more text in the first nested inline
   /fo:inline
   More text after the first nested inline.
/fo:block

The acceptable set of breaks may turn out to give a result like 
(with '|' = the end-boundary of the line)


Some text with auto keep-constraint |(1)
Some text with a keep.within-line constraint of|(2)
100 keep.within-line=500   |(3)
Some more text in the first nested inline  |(4)
More text after the first nested inline|(5)

Only the third and the fourth line I'm still unsure about. May the 
content in the fourth line be broken itself?


I'm quite unsure about the third line too, as the outer keep constraint 
affects also the space between the inner inline and Some more text ... 
so break #3 is violating the keep and even creating a very short line.


I would probably expect something like

  Some text with auto keep-constraint |(1)
  Some text with a keep.within-line constraint of|(2)
  100 keep.within-line=500 Some more text in |(3)
  the first nested inline More text after the|(4)
  first nested inline|(5)

where the inner keep with higher force is fully satisfied, and the outer 
one is violated twice (breaks #2 and #3).


But maybe the algorithm could still prefer

  Some text with auto keep-constraint Some|(1)
  text with a keep.within-line constraint of |(2)
  100 keep.within-line=500 Some more text in |(3)
  the first nested inline More text after the|(4)
  first nested inline|(5)

where the outer keep is violated thrice (breaks #1, #2 and #3).

So, maybe my sketched strategy would respect the keep priority (the inner 
keep will never be violated by a lower force one) but does not find the 
minimal set of violations. Or, at least, not if minimal means just fewer 
violations; if it could be interpreted as fewer violations producing a 
good-looking result it could be ok.


One last quick note concerning nesting: what if the inner inline had a 
*lower* force? feasible breaks in it should be first given a high penalty 
(to see if we can put everything together) and then a lower one (so the 
penalties should hold a nested chain of force values)?


This fuzzy logic is complicated! :-)

Luca

demerits.pdf
Description: Adobe PDF document


Re: svn commit: r557347 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/layoutmgr/inline/ test/layoutengine/ test/layoutengine/stan

2007-07-19 Thread Luca Furini
Firstly, hi all! It has been quite a long time since I last posted or 
committed anything, but I'm still here!. :-)


Then, congratulations for all the great progresses fop is making!

And finally, concerning the keeps ...

Andreas L. Delmelle wrote:


[inserting penalties with higher value to represent numeric keeps]

This should steer the line-breaking algorithm in the right direction 
to satisfy all keep constraints, IIC. The only big difference 
compared to an auto keep-constraint, if I judge correctly, would then 
be that we would somehow have to use penalties to represent all legal 
break-opportunities. Instead of glues being considered as feasible 
breakpoints, they would always be preceded by a zero-width penalty 
having a value corresponding to the keep-constraint governing the 
base FO.


I'm not sure the steering capability of penalty values would be enough 
to get the prescribed result [section 4.8 Keeps and breaks (particularly 
the last paragraph)]: the algorithm could still prefer violating a keep 
with force = N to satisfy some keeps with force  N, as, IIRC, the 
demerits ultimately depends much more on the necessary stretch / shrink 
than on the penalty value.


I think that the breaking algorithm could be performed one time for each 
distinct force value. Something like this:


lastConfirmedBreaks = ... the set of breaking points considering only 
always keeps
ArrayList forceLevelList = findForceLevels(sequence); // in the reversed 
order
int forceLevelIndex = 0;
boolean tryAgain = true;

while (tryAgain  forceLevelIndex = forceLevelList.size() - 1) {
revisedSequence = setPenaltyValue(seq, lastConfirmedBreaks,
forceLevelList.get(i), HIGH_PENALTY_VALUE);
... compute the set of breaking points for revisedSequence
if (... they are still acceptable) {
lastConfirmedBreaks = ... these ones
i ++;
} else {
tryAgain = false
}
}

- in the sequence, keeps having force = always would be represented by 
+INF penalties;


- keeps with numeric force start with a 0 penalty value;

- the method setPenaltyValue() sets those with the given force a high 
value, and those with greater force (which have not been chosen as breking 
points, this is why we must pass the computed breaks too) are set to +INF 
so we are sure they would not be violated


If this approach is correct, the key point would be how to decide whether 
or not the computed set of breaks is still acceptable ...


Hope this helps ...

Luca




Re: Footnotes in the float branch

2007-03-27 Thread Luca Furini

Vincent Hennebert wrote:


Hi Luca,


Hi!


I had a look at your patch and have several comments:
- I see you re-enabled the noBreakBetween method; I don't think it's
  a good solution because it artificially prevents some nodes to be
  created, which even if bad may be necessary for some complex
  documents. See for example the attached fo file.


Right, it is quite an unlucky document!

Anyway, I still think that a footnote should be placed in a page following 
its citation only if there isn't really any other option: for example, the 
citation is inside a large block of unbreakable text, and the footnote 
itself is a large unbreakable block, and their cumulative height is taller 
than the page height (a situation that will surely happen sooner or later 
;-) , but is quite more unlikely than your example).


I think your example would not look so bad in the context of a page with 
some book-like width and height: yes, there would be quite a large space 
between the last content line and the first footnote line, but I think 
many users would prefer such an output to one having the footnote placed 
in the following page.



I also documented
  a similar problem on the wiki [1]. While it makes the testcases work
  it actually creates some bad layout in other cases.


The one in the 4.1 / footnote section? It's a very interesting one, 
although I think it's quite another story.


While in the previous example we have two valid options, and the algorithm 
chooses the ugliest one, here we have the algorithm a priori discarding 
the option that would be the best one. I think we could call this a bug, 
as there can be no doubt concerning what a user would expect.


I'm attaching [check; double check; look again; yes, it's there!] a 
testcase showing this kind of layout problem. Trunk leaves an empty space 
between the last content line and the first footnote one, the float branch 
places two more content lines, filling the empty space, and the patched 
branch behaves the same way.



[snip on the other good remarks]

My feeling is that the Knuth algorithm can nicely handle such problems
already as is. It's just a matter of defining the right demerits for
deferred footnotes, and give a chance to too-short nodes with
non-deferred footnotes to be considered WRT normal nodes with deferred
ones.


Demerits could not be enough: if there isn't any object with some stretch 
or shrink and the footnotes / floats do not fit exactly in the page but 
the content lines do, too-short nodes will only be considered when there 
is a restart and there isn't any deactivated node. Maybe we should be less 
restrictive on the ratio-based selection criterion.



I seem to remember that there was also a problem with flushing
floats on the last page (footnotes were unnecessarily deferred). I'd
have to dig deeper into that. I'll try to illustrate my ideas in a patch
in the next days.


Ok, I'm looking forward to see it!

Regards
Luca

footnote_positioning_6.xml
Description: application/xml


footnote_positioning_6.patched.pdf
Description: Adobe PDF document


Footnotes in the float branch

2007-03-26 Thread Luca Furini

Hi all

I recently had the time (and the pleasure) to look at before-float 
implementation branch, and I played a bit with it.


I focused on the handling of footnotes, as I noticed that sometimes they 
were placed on a page following their citations without a real necessity 
to do it; as I wrote some time ago (and I rememeber there was some 
consesuns on this) this behaviour is acceptable for before floats, but is 
probably not what a user would expect for footnotes.


I have tried to fix this in the PageBreakingAlgorithm, computing a 
minimum required index for footnotes, so that no page break will be 
considered that unnecessarily defers some old footnotes to the next page.


I'm attaching a diff file showing the changes (or maybe should I just 
apply it?); after applying the patch, there are 4 more passing testcases 
(foonote_footnote-separator, footnote_large, footnote_positioning_{4,5}) 
and no regressions. Testcases footnote_positioning_{2,3} still generate 
some run-time exception, and in the next days I'm going to see what's 
wrong with them.


I add just a few comments about the new classes: I must admit that it took 
me a while to see and understand the interaction between the 
PageBreakingAlgorithm and the Footnotes / BeforeFloats Record, together 
with their inner Footnotes / BeforeFloats Progress.


In particular, at the beginning I thought the *Progress classes were just 
convenience classes to get pieces of footnotes and floats without 
directly fiddling with element lists, and I found only later that their 
methods can actually create new active nodes.


Another thing that I find a bit strange is that the PageBreakingAlgorithm 
does not directly interact with the before floats, as the calls to 
BeforeFloatsProgress.consider() are hidden in the FootnotesProgress 
class.


So, I was wondering whether it wouldn't be more clear to have the 
PageBreakingAlgorit control all the node creation logic, after having 
accessed information about footnotes and floats that could be placed in 
the page via the helper classes.


WDYT?

Regards
Luca


Re: Footnotes in the float branch

2007-03-26 Thread Luca Furini

On Mon, 26 Mar 2007, Luca Furini wrote:


I'm attaching a diff file showing the changes 


Well, *now* I'm attaching bla bla :-)

Regards
LucaIndex: src/java/org/apache/fop/layoutmgr/breaking/FootnotesRecord.java
===
--- src/java/org/apache/fop/layoutmgr/breaking/FootnotesRecord.java 
(revision 521755)
+++ src/java/org/apache/fop/layoutmgr/breaking/FootnotesRecord.java 
(working copy)
@@ -91,6 +91,21 @@
 addSeparator();
 }
 }
+
+/**
+ * 
+ */
+public void handleDeferredFootnotes(int requestedLastIndex) {
+   boolean separatorAlreadyAdded = (alreadyInserted.getLength()  
0);
+   // check if we must add more footnotes
+   while (lastInsertedIndex  requestedLastIndex) {
+   next();
+   }
+   // if needed, add the separator
+   if (!separatorAlreadyAdded  alreadyInserted.getLength()  0) {
+   addSeparator();
+   }
+}
 
 /**
  * If the current page is a float-only page, handles the splitting of 
the last
Index: src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java
===
--- src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(revision 
521755)
+++ src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(working copy)
@@ -545,6 +545,17 @@
 log.debug(Could not find a set of breaking points  + 
threshold);
 return 0;
 }
+// lastDeactivated was a good break, while lastTooShort and 
lastTooLong 
+// were bad breaks since the beginning;
+// if it is not the node we just restarted from, 
lastDeactivated can 
+// replace either lastTooShort or lastTooLong
+if (lastDeactivated != null  lastDeactivated != lastForced) {
+if (lastDeactivated.adjustRatio  0) {
+lastTooShort = lastDeactivated;
+} else {
+lastTooLong = lastDeactivated;
+}
+}
 if (lastTooShort == null || lastForced.position == 
lastTooShort.position) {
 if (isPartOverflowRecoveryActivated()) {
 if (this.lastRecovered == null) {
Index: src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java
===
--- src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java 
(revision 521755)
+++ src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java 
(working copy)
@@ -1285,7 +1285,12 @@
 (new KnuthGlue(lineStartBAP, 0, 0,
new LeafPosition(this, -1), false));
 } else {
+// the first penalty is necessary in order to avoid the glue 
to be a feasible break
+// while we are ignoring hyphenated breaks
 hyphenElements.add
+(new KnuthPenalty(0, KnuthElement.INFINITE, false,
+new LeafPosition(this, -1), false));
+hyphenElements.add
 (new KnuthGlue(0, 3 * 
LineLayoutManager.DEFAULT_SPACE_WIDTH, 0,
 new LeafPosition(this, -1), false));
 hyphenElements.add
Index: src/java/org/apache/fop/layoutmgr/PageBreakingAlgorithm.java
===
--- src/java/org/apache/fop/layoutmgr/PageBreakingAlgorithm.java
(revision 521755)
+++ src/java/org/apache/fop/layoutmgr/PageBreakingAlgorithm.java
(working copy)
@@ -77,7 +77,7 @@
 /**
  * Are footnotes-only pages allowed?
  */
-public static final boolean FOOTNOTES_ONLY_PAGES_ALLOWED = true;
+public static final boolean FOOTNOTES_ONLY_PAGES_ALLOWED = false;
 
 /**
  * Additional demerits for an underfull page, which however has an 
acceptable fill ratio.
@@ -115,6 +115,14 @@
 private BeforeFloatsRecord beforeFloatsRecord;
 private FootnotesRecord.FootnotesProgress footnotesProgress;
 private BeforeFloatsRecord.BeforeFloatsProgress beforeFloatsProgress;
+// number of new footnotes met since the last feasible break
+private int newFootnotesCount = 0;
+// the method noBreakBetween(int, int) uses these variables 
+// to store parameters and result of the last call, in order
+// to reuse them and take less time
+private int storedPrevBreakIndex = -1;
+private int storedBreakIndex = -1;
+private boolean storedValue = false;
 
 private ActiveNodeRecorder activeNodeRecorder = new ActiveNodeRecorder();
 
@@ -682,6 +690,7 @@
 if (box instanceof KnuthBlockBox

Re: Before floats + footnotes

2007-01-24 Thread Luca Furini

Vincent Hennebert wrote:


I've had a quick look, that's not handled currently. At some place in
the code the space-before set on the separator is converted into a
MinOptMax(opt, opt, opt).


If I remember correctly, the separator bpd is taken from the generated 
area (so there isn't any stretch or shrink) instead of the element 
sequence. I think I (?) did so after a few unsuccessful tries to get the 
dimension in a better way.



Anyway, even if defining some elastic height for the separator would
certainly help improve the situation, that's not something we can expect
from users.


I agree, the algorithm should be able to handle the most common situation 
without any special hint.



I think that, after all, this could be fixed just by checking some
additional condition before calling handleNode() the first time (when
footnotes and before floats are not not taken into account).


Not sure it's that simple. Is a page containing only two lines of normal
text with no deferred footnote more desirable than a full page with a
deferred footnote? I think that might disturb the reader as well. That's
why I got the idea of a minimum fill ratio. But obviously, as this is
currently implemented that's not enough. I'll think more about it.


You are right, there should probably be an upper limit to the amount of 
footnotes, and probably a user-configurable limit would be the ideal 
solution, so that the users could set it in the document using an 
extension property.


While the old code forbade pages with too few footnotes, the minimum fill 
ratio avoids pages with too few content lines: we should find a way to 
combine these techniques, without eliminating each possible solution! :-)


Regards
Luca



Re: Before floats + footnotes

2007-01-22 Thread Luca Furini

Vincent Hennebert wrote:


I don't think there is much you can do in that case. It appears that the
15 lines of text at 12 pt exactly fill the 3 inch-high page. So that
makes a feasible node which is always preferred to too-short nodes.
Change the page-height to 3.1 inch and you no longer have the footnotes
deferred to the next page.
That's exactly why I introduced the MIN_NORMAL_PAGE_FILL_RATIO constant
in PageBreakingAlgorithm: to give a chance to underfull pages with no
deferred floats to be preferred over full pages with deferred ones. Keep
the page-height of 3 inches and change that constant to 0.9 and you have
your footnotes back on the first page.


I agree that, in a sense, the error is in the fo file, defining a 
fixed-height footnote separator that does not fit well the page; with an 
heigth of 12 pt, all would be ok. In alternative, it could be defined 
using a min-opt-max line height or space-*, allowing for some stretch and 
shrink (not sure this would be handled correctly at the moment, but this 
is not the point).


But I don't agree with you when you say that makes a feasible node which 
is always preferred to too-short nodes. I'm not at all convinced that it 
is a feasible break. Even if the FO recommendation says that the 
footnote body could be placed in a page following the one with the anchor, 
I think it should be read in a restrictive interpretation, deferring a 
footnote only if there isn't any possible alternative.


In this case, it is possible to place the footnote in the same page that 
contains its citation, so I think that the algorithm should not be allowed 
to prefer a break that defers it. Note [:-)] that the footnote could 
appear after *many* pages, if there are lots of 12pt-high lines of normal 
text.


In this respect, from a user perspective, footnotes and before floats are 
quite different: while it's completely acceptable for a figure or a table 
to be placed in a page following the text referring to it, I'm sure most 
users would be quite disappointed to find out that a footnote has been 
unnecessarily deferred.


So, while I think the idea of the page fill ratio is very good for the 
placement of before floats, I think footnotes should have a different 
handling, a preferential treatment limiting deferments to the extremely 
unlikely case of an unbreakable group of lines with a lot of footnotes, a 
few of which does not fit in the page (or some other extreme situations).



The actual problem IMO is to define the right demerits for underfull
pages and deferred before-floats and footnotes in order to have a decent
result (i.e., that a human would expect) in every case.


I don't think it would be enough: the expected break (the one with both 
footnotes on page 1) is a short solution and is not recorded, it just 
updates lastTooShort. As long as there is not a restart (and having just 
12pt-high lines it will never happen), it doesn't have a chance to be 
used.


I think that, after all, this could be fixed just by checking some 
additional condition before calling handleNode() the first time (when 
footnotes and before floats are not not taken into account).


Regards
Luca


Before floats + footnotes

2007-01-19 Thread Luca Furini

Hi all!

At long last, I'm finally allowed some time to look at the float branch 
and ... wow! Really impressive, a great lot of good work!


In order to apologize for my long absence :-) , I'm trying to see what's 
wrong with the failing testcases, in particular the ones with footnotes.


Looking at the behaviour of the page breaking algorithm during the 
processing of testcase footnote_footnote-separator, I found out that:


- the right page break (12 lines of content, some space, the separator 
and 2 footnote lines) does not create a new active node, it just updates 
lastTooShort; this is right, as there are no stretchable elements and the 
resulting adjustment ratio would be +inf;


- but then, instead of having a restart, new active nodes are found that 
fill the page but push the footnotes in the following page.


I'm going to see how best to fix this behaviour ... obviously if nobody 
else is quicker than me! :-)


Regards
Luca


Fix for bugs 41019 + 41121

2006-12-22 Thread Luca Furini

Hi all

I have a patch fixing bugs 41019 and 41121, for both trunk and float 
branch, and I'm wondering how it's best for me to proceed in order to 
avoid merging problems: should I change both trunk and branch, or just one 
of them?


The patch is extremely simple and does not break any testcase: I only had 
to adjust the checks in a testcase because of the different line breaks. 
However, it adds some three lines to the TextLM, so maybe it's better if I 
wait for Simon to apply his unicode breaking changes?


I'm attaching the patches, just to let you see if they interfere with 
someone else's work-in-progress.


(sorry for repeating what I wrote some time ago, but I have experienced 
some e-mail problems and I probably lost some messages)


Regards
   LucaIndex: 
test/layoutengine/standard-testcases/block-container_content_size_percentage.xml

===

--- 
test/layoutengine/standard-testcases/block-container_content_size_percentage.xml
(revision 486106)

+++ 
test/layoutengine/standard-testcases/block-container_content_size_percentage.xml
(working copy)

@@ -61,9 +61,9 @@

 !-- from the spec: If that dimension is not specified explicitly (i.e., 
it depends on 

  content's blockprogression-dimension), the value is interpreted as 
auto. --

 !-- The 10% are ignored in this case. --

-eval expected=28800 xpath=//flow/block[2]/@bpd/ !-- 2 lines --

+eval expected=43200 xpath=//flow/block[2]/@bpd/ !-- 3 lines --

 eval expected=10 xpath=//flow/block[2]/@ipd/

-eval expected=28800 xpath=//flow/block[2]/block[1]/block[1]/@bpd/

+eval expected=43200 xpath=//flow/block[2]/block[1]/block[1]/@bpd/

 eval expected=5 xpath=//flow/block[2]/block[1]/block[1]/@ipd/

 

 !-- absolute --

@@ -76,9 +76,11 @@

 !-- from the spec: If that dimension is not specified explicitly (i.e., 
it depends on 

  content's blockprogression-dimension), the value is interpreted as 
auto. --

 !-- The 10% are ignored in this case. --

-eval expected=43200 xpath=//flow/block[4]/@bpd/ !-- 3 lines --

+eval expected=57600 xpath=//flow/block[4]/@bpd/ !-- 4 lines --

 eval expected=10 xpath=//flow/block[4]/@ipd/

-eval expected=43200 xpath=//flow/block[4]/block[1]/block[1]/@bpd/

+eval expected=28800 xpath=//flow/block[4]/block[1]/block[1]/@bpd/ 
!-- the first 2 lines ... --

 eval expected=5 xpath=//flow/block[4]/block[1]/block[1]/@ipd/

+eval expected=28800 xpath=//flow/block[4]/block[1]/block[2]/@bpd/ 
!-- ... and the other 2 lines --

+eval expected=5 xpath=//flow/block[4]/block[1]/block[2]/@ipd/

   /checks

 /testcase

Index: src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java

===

--- src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java 
(revision 486104)

+++ src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java 
(working copy)

@@ -1285,7 +1285,12 @@

 (new KnuthGlue(lineStartBAP, 0, 0,

new LeafPosition(this, -1), false));

 } else {

+// the first penalty is necessary in order to avoid the glue 
to be a feasible break

+// while we are ignoring hyphenated breaks

 hyphenElements.add

+(new KnuthPenalty(0, KnuthElement.INFINITE, false,

+new LeafPosition(this, -1), false));

+hyphenElements.add

 (new KnuthGlue(0, 3 * 
LineLayoutManager.DEFAULT_SPACE_WIDTH, 0,

 new LeafPosition(this, -1), false));

 hyphenElements.add

Index: src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java

===

--- src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(revision 
486104)

+++ src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(working copy)

@@ -545,6 +545,17 @@

 log.debug(Could not find a set of breaking points  + 
threshold);

 return 0;

 }

+// lastDeactivated was a good break, while lastTooShort and 
lastTooLong 

+// were bad breaks since the beginning;

+// if it is not the node we just restarted from, 
lastDeactivated can 

+// replace either lastTooShort or lastTooLong

+if (lastDeactivated != null  lastDeactivated != lastForced) {

+if (lastDeactivated.adjustRatio  0) {

+lastTooShort = lastDeactivated;

+} else {

+lastTooLong = lastDeactivated;

+}

+}

 if (lastTooShort == null || lastForced.position == 
lastTooShort.position) {

 if 

LineBreakUtils compilation error?

2006-12-22 Thread Luca Furini

I've just updated my local copy of trunk and rebuilt.

At first, I could not be able to successfully complete the compilation, as 
I received an error concerning the file 
src/java/org/apache/fop/text/linebreak/LineBreakUtils.java non containing 
the class org.apache.fop.text.linebreak.LineBreakUtils.


Indeed, the reported package inside the file was
   org.apache.commons.text.linebreak
after changing it to
   org.apache.fop.text.linebreak
ant ended with a success.

Is this a little oversight, or I simply wasn't following the right 
compilation procedure?


Regards
Luca


Re: UAX#14 implementation

2006-12-20 Thread Luca Furini

Manuel Mall wrote:

After making the appropriate adjustment to the checks in that testcase 
ALL testcases are now passing!


Wonderful!

I'm really looking forward to see this great new feature!

Just a couple of doubts concerning the differences with respect to the old 
implementation (I must confess I read the Unicode Annex quite quickly 
...):



Just discovered the first instance of an existing testcase which
gives a different result. Under UAX#14 the following text (Note
this is plain text not FO markup!):
text-align=center .conditionality=retain linefeed-treatment=preserve.
which appears in inline_border_padding_conditionality_2.xml has
only a single break opportunity which is before the word
linefeed-treatment. The space between center and .conditionality
is not a break


Does this happens because that space is just before a .?

Another doubt: why aren't the - signs in text-align and 
linefeed-treatment possible breaks?



Regards
Luca


Re: DO NOT REPLY [Bug 41019] - Left-align oddness with long, unbreakable strings following

2006-12-12 Thread Luca Furini

Vincent Hennebert wrote:


I'd have to think more about it, but:
- perhaps the compareNodes method should compare the line/page numbers
  for each node rather than the index in the Knuth sequence. Or some
  mixing of the two.


The index can tell us which node allows to lay out more content, the line 
number ... I am not able to see it as a very informative measure ...



- if you restart using the last deactivated node you are sure that
  immediately after that you'll have to restart using the last
  too-short/too-long node, because no feasible break will be found
  (otherwise the list of active nodes wouldn't have been emptied).


Yes, but I think we have a significant difference: in the first case we 
will have N good lines, a bad line and maybe some other good lines; in the 
second we have N-1 good lines, a quite-bad one (either too long or too 
short), then a bad one and finally some good ones.


I've preparared a very small patch fixing a couple of things:
- the TLM add a zero-width infinite-value penalty to forbid breaks at the
  glue elements used for left/right aligned text (I'm going to check if a
  similar fix is needed elsewhere in the code)
- the BreakingAlgorithm uses (if possible) lastDeactivated instead of
  either lastTooShort or lastTooLong.

The patch is just a dozen of lines long, and it was easy to apply it to 
the float branch.


How should I proceed? Apply it to both trunk and branch? Only to the 
branch?


I'm also going to mark bug 41121 as a duplicate of 41109, as the problem 
is exactly the same: the algorithm restarts from a very bad break instead 
of a good one (in that case, after the first word).


Regards
Luca


Re: DO NOT REPLY [Bug 41019] - Left-align oddness with long, unbreakable strings following

2006-12-05 Thread Luca Furini

Chuck Bearden wrote:


If in a left-aligned block some typical text words are followed by a string
longer than the line-length and containing no spaces (e.g. a long URL), then the
foregoing text will have premature line breaks, i.e. halfway to two-thirds the
way into the line.


I had a look at this, and what I found out is that the strange-looking 
lines are the combined effect of three different problems. So, sorry in 
advance for the long post, but breaking is never an easy matter! :-)



1) TextLM breaks the text even when a / or a - is found, handling them 
as hyphenation points with the usual sequence of glue + penalty + glue 
elements.


The LineLM tries, in the first instance, to avoid using hyphenation 
points, so the penalty is not taken into account. But this has the side 
effect of using the first glue element as a feasible break (if the penalty 
were a feasible break too, it would surely be a better one, such avoiding 
the glue to be effectively chosen).


This is probably the smaller of the problems, and can be solved just 
adding an infinite penalty before the first glue element. But maybe we 
want to prevent this breaking to happen, as we can now use 
zero-width-spaces to explicitly insert breaking positions?



2) The presence of an inline object larger that the available width makes 
the algorithm to deactivate all the active nodes and then restart with a 
second-hand node, as no line can be built that does not overflow. The 
restarting node was chosen, in BreakingAlgorithm.findBreakingPoints(), 
between lastTooShort and lastTooLong, neither of them being a good 
breaking point. There is a lastDeactivated node chosen among the 
deactivated nodes but it was not used.


A deactivated node previously was an active one, so it is surely better 
than a node who failed to qualify; replacing either lastTooShort or 
lastTooLong (according to the adjustment) with lastDeactivated leads to a 
better set of breaks. However, this in not enough. The attached file 
small.20.pdf shows the result after fixing these first two problems.



3) At the moment, the LineLM can call findBreakingPoints() up to three 
times, the last one with a maximum adjusting ratio equal to 20. I came to 
the conclusion that this is really TOO much. I tried stopping after the 
second call (with max ratio = 5) and the result is much better (see 
attached file small.5.pdf).


A high maximum adjustment ratio means that the algorithm is allowed to 
stretch spaces a lot in order to find a set of breaks which is *globally* 
better; this means that it can choose some not-so-beautiful breaks in 
order to build a set spanning over a larger portion of the paragraph.


In our example: there can be a break just before the long url (a line 
ending after Consider:) only if we use an enormous adjustment ratio. 
With a smaller, more appropriate threshold, Consider: can no more end a 
line, so the algorithm will restart from a previous point.



In conclusion: the first two items are easily fixed, and I'm going to 
commit the changes in the afternoon (in there are no objections); 
concerning the question of the automatic break at /- characters, I'll 
probably leave the code unchaged for the moment, until we decide what is 
best.


Concerning point #3, I'm going to have a closer look at the restarting 
mechanism ...


Regards
Luca


small.20.pdf
Description: Adobe PDF document


small.5.pdf
Description: Adobe PDF document


Re: XSL-FO 2.0 workshop in Heidelberg next week

2006-10-10 Thread Luca Furini

Jeremias Maerki wrote:


If anyone has any requirements for XSL-FO 2.0 which I should bring up at
the workshop in Heidelberg next week, please let me know. Deadline
2006-10-16 so I have time to prepare.

Luca, are you going, too? How do you travel?


Yes, I'm going.

I think I'll travel by train, but I haven't fixed all the details yet.

I was waiting for more precise news to appear on the workshop site, but 
there have been no recent updates ... I should really start deciding 
anyway!


I think I'll end up arriving the day before the workshop, and probably 
leaving the day after it, so we could find plenty of time to chat about 
fop.


Regards
Luca


Re: Necessary conditions to defer footnotes

2006-07-12 Thread Luca Furini

Vincent Hennebert wrote:


there is something I don't get with the handling of footnotes. When
there is not enough room on the current page to place all the footnotes,
the algorithm tries to find a place where to split them. But there is a
condition: it must be possible to defer old footnotes
(PageBreakingAlgorithm, l.332). And this is possible only if there is no
legal breakpoint between the previous active node and the currently
considered breakpoint (checkCanDeferOldFootnotes method). I don't
understand this latter condition?


This is to avoid keeping deferring part of the old footnotes when there is 
no real need to do it.


Let me explain with an example: let's pretend we have a long footnote, 
which cannot be wholly placed on the same page where its citation is; so, 
when we start building the following page we should try to place all the 
remaining old footnote lines, if this is possible.


However, it can happen that the breaking algorithm, without this check, 
prefers filling the page with normal lines, such placing just a single 
footnote line and deferring the others to the next pages.


For example, the footnote has 10 lines, and 3 are placed on the first page 
while the others are deferred one first time as there is not enough space 
for them; without this condition, it could happen that if there are no new 
footnotes (which would force a flush of the old one) the algorithm 
places just a single footnote line in the following seven pages, filling 
the remaining space with normal lines, while we want the footnote to be 
deferred again only if there is no way to place lines 4 to 10 together.



And, reading the code, I don't understand if this method's purpose is to
determine if it is /allowable/ to defer footnotes (am I authorized to
defer footnotes if any), or if it is /possible/ (are there footnotes to
defer). Ok, this is a bit subtile, but understanding that would help me
get the intent of the algorithm.


The former one, the method purpose is to determine if the algorithm is 
allowed to break the foonote once again, which can happen only if we have 
added only the slightest bit of normal lines () and the remaining space is 
not enough.


If there are no old footnotes the method return false (which is maybe not 
very clear), but has no effect.


HTH

Luca


Re: Error message: Should be first

2006-07-11 Thread Luca Furini

Jeremias Maerki wrote:


One of my clients reported to me that he gets a Should be first error
message on the log. This happens in (Page)BreakingAlgorithm.removeNode().
I get the impression that the code there is not finished rather than
that is a real error condition. I'll try to extend removeNode() so it
really removes the disabled node.


That's quite strange ...

The reason why the to-be-removed node should be the first one is this:

active nodes are ordered by line (page) number and by index of the element 
where the feasible break can happen, so, for example, a node representing 
a break for page 13 at element #150 is (or at least it should be) before a 
break for page 13 at element #152;


a node is removed when it is too far from the current feasible break being 
evaluated (or, in other words, from the node and the current position 
there is too much content to be placed in a single line / page), so in 
normal situations nodes are removed in order: for example, if we are 
evaluating a break at element #180, and we are too far from the node 
representing the break for page 13 at element #152, we will have already 
removed the node representing a break at page 13 element #150 (as it will 
be farther from the current element);


this could be no more true when there are footnotes: for example the break 
at element #152 could represent a page where we have placed one more 
normal line in page 13, but fewer footnote lines with regard to the 
break at element #150, so the node coming first allows to place more 
content than the following one, and we could need to remove the node at 
#152 *before* the one at element #150


However, this does not explain why this warning shows in what appears to 
be a very simple document.


I'm going to have a closer look ...

Regards
Luca


Re: Error message: Should be first

2006-07-11 Thread Luca Furini

I've had another look at this.

A few debug outputs shows that the error arises when trying to remove 
the node KnuthNode at 734 4527603+682968-135942 line:10 prev:687 
dem:11527.971465493918 while the list of active nodes contains


[
 KnuthNode at 734 4527603+682968-135942 line:10 prev:683
  dem:11513.226030457132,
 KnuthNode at 734 4527603+682968-135942 line:10 prev:687
  dem:11527.971465493918,
]

This removal, however, happens at the end of the algorithm, when the best 
layout is chosen (just like Vincent pointed out), and in this situation a 
node could be rightly be removed even if it's not the first one.


We could maybe add a boolean parameter to removeNode(), stating whether it 
is allowed to remove the nodes out of order or not, and only the calls in 
filterActiveNodes() would have it true.


HTH

Luca


Re: keep...=always and Knuth penalties

2006-06-20 Thread Luca Furini

Jeremias Maerki wrote:


 On 19.06.2006 15:45:36 Luca Furini wrote:
 It seems to me that the prescribed behaviour requires a keep constraint 
 with force = always to be satisfied *always* :-), even if this would 
 mean having some overflowing content. 


Obviously, we disagree here. I read it so that always can also be
relaxed if the keep cannot be satisfied. Did anyone check what other
implementations do?


A quick test shows that AntennaHouse's xslformatter satisfies all the 
keeps, even when this means having some content overflow the body region 
(the overflowing content is actually clipped), while RenderX's xep relaxes 
a keep constraint in order to avoid overflows.


So, it seems the match is still a draw! ;-)

Regards
Luca


Re: keep...=always and Knuth penalties

2006-06-19 Thread Luca Furini

Manuel Mall wrote:


What is still unclear to me is if it is worthwhile to implement this
two pass approach, i.e. use INFINITE penalties first and relax later, or if
it is good enough for 99.99% of cases just to start with INFINITE-1
penalties for mandatory keeps?


I think the second pass is necessary, in order to be sure that we are 
breaking a keep because there really isn't any other alternative. 
Otherwise, I'm sure that for each value  INFINITE we use, we could create 
a (contrived) example where the algorithm prefers breaking the keep 
instead of using a different, legal (but somewhat uglier) break, such 
behaving in a non-conformant way.


Reading again the specs, I even start wondering whether it would really be 
right to allow a break between objects tied by a keep constraint:


Each keep condition must also be satisfied, except when this would cause 
a break condition or a stronger keep condition to fail to be satisfied. If 
not all of a set of keep conditions of equal strength can be satisfied, 
then some maximal satisfiable subset of conditions of that strength must 
be satisfied (together with all break conditions and maximal subsets of 
stronger keep conditions, if any).


It seems to me that the prescribed behaviour requires a keep constraint 
with force = always to be satisfied *always* :-), even if this would 
mean having some overflowing content. More than this, even a keep with 
force = N could be broken only in order to satisfy a keep with greater 
force, and not to avoid an overflow.


I seem to recall that in Knuth's paper the author talks about a symbol he 
introduced in tex to represent a space that could be used as a line break 
in dire straits, having a penalty value = inf-1 (where inf was the special 
finite value representing infinity). Maybe we could similarly add some 
soft-keep extensions?


Regards
Luca


Re: [GSoC] Wiki page for progress informations

2006-05-31 Thread Luca Furini

Jeremias Maerki wrote:


did you already investigate how footnotes are implemented? Can you say
anything about how similar the problem of footnotes is to before-floats?
Just so you don't have to start from scratch while there may be
something to build upon. After all, the footnotes also contain some
logic to move certain parts to a different page than where anchor is
located.


A few quick comments about the footnote implementation:

1) the FootnoteLM returns only the sequence of elements representing the 
inline part (not the footnote-body part); it just adds to the last 
(inline) box a reference to the FootnoteBodyLM.


2) the LineLM, after computing the breaks, adds to each (block) box 
representing a line the references to the FootnoteBodyLM whose citations 
are in that line


3) during the remaining of the element collection phase, these references 
are not used (but in the creation of combined element lists, when they 
should be copied inside the new elements)


4) the PageSequenceLM.PageBreaker.getNextKnuthElements() method, after 
receiving all the (block) elements, scans them looking for footnote 
information, gets the elements from the referenced FootnoteBodyLM and puts 
them in a different list (at the moment a list of lists, but this is 
sub-optimal), and from the footnote-separator (in a separate list)


5) these lists are looked at in PageBreakingAlgorithm.computeDifference(), 
where we try to add some footnote content to the normal page content 
using getFootnoteSplit(), and in computeDemerits(), where some extra 
demerits are added if we break a footnote or some footnotes are deferred.


This last point at the moment is performed using many 
PageBreakingAlgorithm private variables, which is maybe not the best way 
to do it, as we must be very careful about their initialization and their 
use, especially when the algorithm restarts. I think that a state object 
storing these variables could be used to store these values, and 
explicitly passed along the methods instead of relying on the class 
members, but concerning this I'd like to hear the opinions of the other 
committers ...


Insertion of before-floats could be implemented in a similar way, giving 
the precedence to the footnote insertion (as it is affected by more strict 
constraints).


An important difference between a footnote and a before-float is that the 
latter does not have an inline part, so (if we want to follow the same 
pattern) we need to either store the reference inside a previously-created 
box or to add some new elements containing the reference (but we must be 
sure that these elements cannot be parted from the previous ones, see the 
constraints in section 6.10.2 in the spec).


A crucial point is the demerit function as, if I remember correctly, it 
greatly affect the computational complexity of the breaking algorithm 
(thre should be a M. Plass paper concerning this).


HTH


Another thing that we may need to keep in mind: There was lots of desire
from the user community that FOP supports large documents (long-term
goal, not necessary yours). I wrote that a first-fit algorithm could
help free memory earlier. Obviously, for complex before-float situations
a total-fit approach is probably more interesting as it can come up with
more creative solutions. I'm just mentioning it so we keep the bigger
picture in mind and since there could be conflicting goals.


A first degree of first-fit algorithm could be achieved quite quickly by 
having a BreakingAlgorithm interface which is implemented by a TotalFitBA 
(the existing implementation) and a FirstFitBA which would have a much 
simpler considerLegalBreak() method that, instead of the complex set of 
nodes, just keeps in mind a single node.


This would surely decrease the memory footprint, but is not (I think) what 
we really want, as this simplified algorithm would be performed on the 
whole sequence of elements.


In order to start processing the sequence as soon as we receive a few 
elements we need to do some deeper changes.


An idea (I just had it now, so I did not fully consider all its 
implications).
At the moment, the block-level LM collect elements from their children and 
return just a single sequence (if there are no break conditions); we could 
have a parameter requesting them to return after they receive each child 
sub-sequence, and have a canStartComputingBreak() method that returns true 
if the sequence contains enough elements and we are using a first-fit 
algorithm, or false otherwise ...


Sorry for the long post ... and for the long absence too, but it seems 
that just after thinking great, now I've really got some time to spend on 
FOP I receive tons of other things to do ... :-(


Regards
Luca


Re: some footnotes not being displayed

2006-05-23 Thread Luca Furini

Jeremias Maerki wrote:

No idea if anyone else has time to look into it. I don't think it's an 
easy fix, or at least easy to isolate, because footnote handling is not 
trivial. Having a good test case is instrumental in finding the problem 
quickly. Usually, this is step is 60% to fixing a bug.


I'm going to look at this bug: it appears to be (more or less) the same 
bug as #37579: while the sequences of elements representing the 
list-item-label and the list-item-body are combined, information about 
footnotes is lost.


While having a quick fix should not be difficult, I'd like to see if there 
is an elegant way to deal with this kind of problem without having to 
replicate the same code wherever an element combination is performed.


At the moment, I'm thinking of an element sequence iterator, moving from 
a feasible break to another and returning an object with all the needed 
information (width, stretch, shrink, footnotes, ...). I think such an 
object could come in handy for severaral classes ...


If anyone has other ideas, suggestions, objections, just let me know!

Regards
Luca


Re: some footnotes not being displayed

2006-05-23 Thread Luca Furini
I've started looking at the patch attached at bug #37579, for the moment 
concentrating on footnotes inside lists.


Concerning shortcoming 2) (from the bug comment):

2) Footnotes from list-item-body starts at the same position (from the 
starting edge) than the list-item-body itself and not at the starting 
edge of the region-body.


I'm not sure whether what happens is wrong: isn't this the correct result 
of the inheritance of indents?


Shortcoming 1)

1) Footnotes in list-item-label produce a Cannot find LM to handle 
given FO for LengthBase. AFAICS in the getBaseLength method of 
AbstractBaseLayoutManger.


is quite related to this: the message is due to failed attempt to recover 
the value for end-indent (setting end-indent to a fixed value gets rid 
of the message).


The method AbstractBaseLayoutManager.getBaseLength() iterates over the LM 
tree, moving from a LM to its parent: in this case, the traversed LM are:

  BlockLM
  FootnoteBodyLM
  FlowLM
  PageSequenceLM
  null

It seems that the FootnoteBodyLM should have, in this case, a 
ListItemContentLM parent (or maybe some kind of reference, so not to break 
the passing of elements with the PageSequenceLM).


One last note: in the attached example for lisrs, there is a footnote 
inside a static-content, commented out as if this is uncommented a 
runtime error results (quote from the comment).


A run time error is never a good thing, anyway the specs states that It 
is an error if the fo:footnote occurs as a descendant of a flow that is 
not assigned to a region-body (section 6.10.3 fo:footnote); this should 
maybe originate a validation exception ...



Tomorrow I will try and finish fixing this. As a quick fix, it should be 
enough to apply Gerhard Oettl's patch and explicitly set indents on the 
footnote bodies.


Regards
Luca


Re: Generalized Knuth-Plass Linebreaking Algorithm

2006-04-04 Thread Luca Furini

Simon Pepping wrote:


[...]
See http://www.leverkruid.nl/GKPLinebreaking/index.html.

Please, let me know what you think of it.


I'm going to read it carefully, it seems very interesting!

Regards
Luca


Re: letter-spacing

2006-03-01 Thread Luca Furini

Jeremias Maerki wrote:


Still trying to fix my problem with letter-spacing and fixed width
spaces. Do I understand that correctly that XSL-FO's view of
letter-spacing is different than, say, PDF's? PDF's character spacing 
(PDF 1.4, 5.2.1) is designed so it advances the cursor for each (!)

character by the Tc value.


Yes, I remember that when I was working on letter spacing it took me a 
while to understand what was wrong with the resulting pdf! :-)



letter-spacing=1pt:

|_t__e__x__t_  _t__e__x__t_  _t__e__x__t_|


At the moment, fop has

  |t__e__x__t  t__e__x__t  t__e__x__t|

in other words there are letter spaces only between letters, and not 
between a letter and a space.


The recommendation states that The algorithm for resolving the adjusted 
values between word spacing and letter spacing is User Agent dependent. 
(7.17.2 in the candidate recommendation), so I think this is not a wrong 
behaviour: it just assumes that word spaces have a higher precedence than 
letter spaces.


Another little difference: each letter space depends on the preceding 
letter size, instead of depending on both the preceding and following 
letters sizes; but this has some visible effect only when a word is 
composed of letters having different sizes.



PDF's character spacing would work like this, I think (although the last
character space needs to be eliminated by the layout manager [1]):

|t__e__x__t__  __t__e__x__t__  t__e__x__t|(__) -- [1]


This is why the word spacing adjustment stored in the textAreas is not the 
computed one, but is specifically modified in order to counterbalance the 
2 letter spaces that the pdf will add.



If I'm right here (not really sure, that's why I'm asking), it would
mean that we should probably stop using the Tc feature from PDF and
instead control the glyph positioning ourselves like we already do in
PostScript.

WDYT?


As long as we have just two character categories (letter / spaces) the two 
pdf operators were enough.


Now, with fixed width spaces too, which should be unaffected by the both 
word spacing (such being different from spaces) and letter spacing 
(differing from normal letters), two operators are too few.


I don't think we need to set the horizontal positioning of each character 
or word, but just fix the placement of a character sequence following a 
fixed width space, removing the letter spaces wrongly added by the Tc 
operator, alternating character sequences and horizontal adjustments in 
the TJ array.


HTH

Regards
Luca



Re: letter-spacing

2006-03-01 Thread Luca Furini

Jeremias Maerki wrote:

 The recommendation states that The algorithm for resolving the adjusted 
 values between word spacing and letter spacing is User Agent dependent. 
 (7.17.2 in the candidate recommendation), so I think this is not a wrong 
 behaviour: it just assumes that word spaces have a higher precedence than 
 letter spaces.


No, actually in both cases the precedence is force so all spaces
survive the resolution process.


So, just to check I understood:

- according to the pdf specifications between two words there is
  1 word space + 2 letter spaces

- according to the xsl recommendation there is
  1 word space + 1 letter space (or better, two half letter spaces)

- fop currently puts just a word space

Is this correct?

But I still don't understand what the words concerning adjusted values 
between word spacing and letter spacing are supposed to mean ...


However, while I was out for a few hours I was thinking about this and I 
came to the conclusion that it may make sense to keep an array of 
character offsets as an attribute of a WordArea in the area tree.


It would probably be the best way to deal with kerning too.

My only concern is about the resulting pdf size: if we specify an offset 
for each character, wouldn't it become (at least) twice as big as before?


Regards
Luca


Re: white-space-collapse not working in trunk?

2006-02-14 Thread Luca Furini

(moved from fop-users as we are going into the implementation details)

Manuel Mall wrote:

the shorthand property white-space=pre should be used or its expanded 
equivalents:

   linefeed-treatment=preserve
   white-space-collapse=false
   white-space-treatment=preserve
   wrap-option=no-wrap

If you do that the bug I was referring to would show because
white-space-treatment=preserve is not correctly implemented.


In order to preserve all spaces, we could use the elements that are now 
generated for a nbsp:

   box  w=0
   penalty inf
   glue (elastic or not, according to the alignment)

They are not suppressed and they do not allow a break, so I think they 
should fit quite well this situation too, when white-space-treatment = 
preserve and wrap-option=no-wrap.


If wrap-option=wrap, however, we must add some penalties in order to 
allow a break between spaces; we must be careful, as if there are 3 spaces 
between two words, there are 4 possible breaks (ignoring, at the moment, 
unicode breaking rules), so we just cannot add a penalty before or after 
the other elements.


Is this ok, or am I missing some important detail?

Regards
Luca


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-07 Thread Luca Furini


Manuel Mall wrote:

1. The suppress-at-line-break property can be applied to all characters. 
I would take the position at the moment that explicit specification of 
the suppress-at-line-break property is not supported and we worry about 
it at a later stage. I would certainly argue against just supporting it 
in the context of nbsp.


Ok, it's better to take a step at a time!

2. When we discussed UAX#14 line breaking on this list last year Joerg 
pointed out that he had a table driven implementation for it. At the 
the time I took a look, liked it, and updated it for compliance to the 
lastest UAX#14 spec and then shelved it for integration into FOP. That 
is when we move determining line break opportunities to the LineLM 
level (which we discussed extensively before) we get UAX#14 
linebreaking as part of it by integrating Joerg's implementation. As a 
consequence I recommend against putting any UAX#14 specific stuff at 
the lower levels (e.g. TextLM) now in the context of fixing the nbsp 
problem. It will disappear anyway and IMO is therefore not worth the 
effort.


Ok, so for the moment I'll avoid considering interaction between spaces, 
and just fix the character-by-character element creation, which is ready 
and should be enough to handle the most common situations.


This also solves another bug concerning a nbsp being removed when starting 
a line.


I'll make the commit in a few minutes

Regards
Luca


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Luca Furini

Manuel Mall wrote:

IMO nbsp (and any other Unicode special spaces) are outside the scope of 
XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in 
XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. 
Therefore nbsp does not need to be considered when looking at 
white-space-treatment and white-space-collapse. Would that approach 
remove the complications you mentioned?


Thanks for the clarification, Manuel!

This solves the first supposed problem (interaction between nbsp and 
pretty-printing spaces), but the second one is still open: what happens if 
we have

  someContentnbspspaceotherContent ?
*IF* (and I'm not at all sure about this) there can be a break , then both 
spaces should be discarded: in order to implement the correct behaviour 
for this almost hypothetical situation, we would need to create elements 
for both spaces as a whole (and thay could belong to different LMs) 
otherwise the algorithm would not be able to ignore the nbsp during the 
line breaking.


Anyway I think this is quite an unlikely combination of entities and 
properties :-) ; as I see you are already working on something else, for 
the moment I will prepare a patch for the most common situations.


Regards
Luca


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Luca Furini

Manuel Mall wrote:


This solves the first supposed problem (interaction between nbsp and
pretty-printing spaces), but the second one is still open: what
happens if we have
   someContentnbspspaceotherContent ?
*IF* (and I'm not at all sure about this) there can be a break , then
both spaces should be discarded: 


IMO yes there can be a break and no only the space needs to be removed. 
Again the argument is that nbsp is not whitespace as per XSL-FO 
definition and need not to be removed.


What makes you think that both the nbsp and the space needs to be 
removed around a fop generated linebreak?


Oops, I forgot to add an importand condition: if the user explicitly 
states that the nsbp must be discarded around a line break:

  fo:inline suppress-at-line-break=suppressnbsp;/fo:inline
Well, the more I look at this, the more it seems unlikely to ever happen 
... we are probably having a highly theoretical disquisition! :-)


Anyway, I was still not sure whether there could be a break so I looked 
back at the Unicode Annex #14.



GL  Non-breaking (Glue) (XB/XA)  (normative)

Non-breaking characters prohibit breaks on either side, but that 
prohibition can be overridden by SP or ZW. In particular, when NBSP 
follows SPACE, there is a break opportunity after the SPACE and NBSP will 
go as visible space onto the next line. See also WJ. The following lists 
the characters of line break class GL with additional description.


00A0 NO-BREAK SPACE (NBSP)
202F NARROW NO-BREAK SPACE (NNBSP)
180E MONGOLIAN VOWEL SEPARATOR (MVS)

NO-BREAK SPACE is the preferred character to use where two words should be 
visually separated but kept on the same line, as in the case of a title 
and a name Dr.NBSPJoseph Becker. When SPACE follows NBSP, there is no 
break, because there never is a break in front of SPACE.  NARROW NO-BREAK 
SPACE is used in Mongolian. The mongolian vowel separator acts like a 
NNBSP in its line breaking behavior. It additionally affects the shaping 
of certain vowel characters as described in [Unicode] Section 12.3, 
Mongolian.



So, it seems there could be a break between SPACE and NBSP (with NBSP 
starting the next line), but not between NBSP and SPACE. Can we say this 
is settled?


Regards
Luca


Re: line breaking and whitespace handling

2006-02-02 Thread Luca Furini

Manuel Mall wrote:


As far as I remember our last discussion was about who should generate the
Knuth element lists: The individual layout managers or the Line layout
manager. You argued in favour of retaining the current system and I tended
to favour the moving it up the hierarchy to the line LM.

I never spelled out why I am tending to favour the line LM. It all boils
down to in my mind: Do we need to create LM spanning Knuth elements? If
the answer is Yes then my gut feel is we are better off doing it at line
LM level instead of passing context around in argument lists. If the
answer is No then leaving it at lower level LMs is fine.


I see your point, and I agree with you that elements representing inline 
spaces, borders and padding must take into account information coming from 
multiple fo nodes.


Moving the generation to the LineLM level avoids the need to pass rich 
(and large) context information to the children: but the downside is that 
the children must give the LineLM their Positions, as the addAreas() phase 
still counts on the Positions stored in the elements in order to know what 
to place where (or we must rethink this phase too).



One reason to have LM spanning Knuth elements could be for consecutive
whitespace (BTW is it 'white space' or 'whitespace' - I don't have a
clue?)


The xsl recommendation 1.0 had white space (but whitespace in the 
quotation from the css specs), 1.1 has both white space and whitespace 
... they should really make up their mind! :-)



which we need to discard around formatter generated linebreaks. Or
which we may have to stretch/shrink for justification. What I am saying is
if whitespace-collapse=false it may make things easier (and more
economic) if we model consecutive whitespace as a single glue element.

What is a more complicated case is having an fo:inline with border/padding
and whitespace before and after the border:

[example]


What about using the UnresolvedElements? Just as per the block-level space 
resolution, each inlineLM could append at the beginning and at the end of 
its element list an UnresolvedElement storing its border, padding and 
spacing information.


Before performing the line breaking, when all UnresolvedElements are 
known, their information can be combined to create the actual elements.



Another issue which came up since our last discussion but not really
related to the issue above is that because of markers we cannot do
whitespace handling at fo level in all cases but must rerun the fo level
type whitespace handling again at LM level when we have the actual
whitespace related property values which apply in the retrieve-marker
context as they can be different to the values of the same properties in
the marker context.


Maybe in this case we could use UnresolvedElements for the inner spaces 
too (the spaces in the middle of a node text, whose handling in the 
previous situation did not need other infrmation).


WDYT?

Regards
Luca


Re: line breaking and whitespace handling

2006-02-02 Thread Luca Furini

Manuel Mall wrote:


What about using the UnresolvedElements? Just as per the block-level
space resolution, each inlineLM could append at the beginning and at
the end of its element list an UnresolvedElement storing its border,
padding and spacing information.


I don't know anything about the UnresolvedElements as I so far have not 
studied the block level LMs. But this reminds about another requirement 
we may need to consider: Proper conditional start/end space resolution. 
This is currently not done. I don't think we even have testcases for 
it. When Jeremias did the block level before/after stuff the idea was 
that may be we can port this to the inline LMs for the start/end space 
resolution.


So, we could start from here, using UnresolvedElements to handle inline 
space resolution, then take into account conditional borders and paddings, 
and finally trailing / leading space characters.


In the end, it all boils down to compute how much space we have to 
allocate between two words if there isn't a break, and how much after one 
and before the other if there is. Sounds almost easy in these words  
there must be a trick somewhere! :-)



[white space handling within markers]

Maybe in this case we could use UnresolvedElements for the inner
spaces too (the spaces in the middle of a node text, whose handling
in the previous situation did not need other infrmation).


Not sure here - I am more inclined to reuse the fo logic, that is 
iterate over all characters in a paragraph and tell the LMs which one 
to delete probably combined with the Unicode UAX#14 linebreaking.


Ok, effectively if all we have to decide is whether to discard or retain a 
space character it's better to reuse what we already have.


Regards
Luca


Off line for a week

2006-01-12 Thread Luca Furini

Hi all!

I apologize for having been not very active for the last weeks, but at 
long last things should change: next week I will be in San Jose 
(California) attending a conference about digital publishing, and after 
that I should have some time to spend working on FOP (and I really can't 
wait to!).


Regards
Luca



Re: Hyphetation broken with last commits

2005-12-17 Thread Luca Furini

Manuel Mall wrote:

Luca, 
why does our line breaking algorithm insist on having at least one Box 
in a paragraph? Is that inherent in the Knuth algorithm, i.e. can't it 
deal with empty paragraphs, that is paragraphs containing only Glue/Pen 
elements?


If I remember correctly, a sequence starting with glue / penalty elements 
would not make the algorithm crash, but the produced ouput will take into 
account the width of the glues too, while it should not.


This happens because there is not a previous break, whose handling would 
have the effect of ignoring glues and penalties between the break and the 
first next box.


We could maybe move the leading space removal at the beginning of the 
breaking algorithm itself, which could then check if there are some 
elements left and create an empty line break if there is none.


HTH, unfortunately these days I'm really really busy and I have not much 
time to look at this.


Luca


Re: 4.3.2 Overconstrained space-specifiers

2005-12-09 Thread Luca Furini

Jeremias Maerki wrote:


You will have seen that I've been working on overconstrained documents.
5.3.4 Overconstrained Geometry is more or less implemented, so now I
need to have a look at 4.3.2 which proves quite difficult to understand.
At least I can't make much sense of it ATM.

[...]

If anyone has an idea what rule 4 in 4.3.1 or the section 4.3.2 is about
I'd love to read your thoughts. Otherwise, I will run this through the
XSL editors list.


I always thought (probably wrongly) these sections of the spec refer to 
the page regions, maybe because of the property display-align, and more as 
a way to formally justify what is usually done than as prescribing some 
particular behaviour.


To be more clear (I hope :-)): region viewports usually have a well-known 
height (unless there is only a single page whose height is unbounded); 
their area children don't always fill them completely.  The content areas 
are placed at the top / center / bottom of the viewport according to the 
value of display-align: but, as these extra spaces may be in contrast with 
the space properties of the first and last child areas, we need, from a 
formal point of view, a rule saying that we are allowed to do this, 
otherwise the specs would be inconsistent.


In other words, I always read these rules as: spaces added ad the top / 
bottom of a page to implement display-align have greater precedence than 
space-before or space-after traits of the child areas. According to me, 
rule 4 should state something like this: the maximum value of the 
space-specifier is set to the difference between the containing height and 
the content height.


Don't know if this makes any sense ...

Regards
Luca


Re: Kerning

2005-12-09 Thread Luca Furini

Starting from your final summary:

Manuel Mall wrote:


IMO FOP should limit itself to:
a) Use kerning only for consecutive characters within the same fo


Ok, but more on this later in this message ...


b) Limit itself to the kerning information in the font


Ok

c) Only apply kerning if the letter-spacing property has the value 
normal (and the font supports it)


Isn't this condition too strong? I see kerning as an extra space, 
something that is added to the letter spacing, not something that replaces 
it.


A simple example with our kerning couple AV:

a) at the moment kerning is not implemented, so with normal letter-spacing 
the space between A and V seems bigger than the space between I and L, for 
example;


b) we implement kerning, so the space between A and V is reduced and it 
visually looks like the space between I and L;


c) what if we have a negative letter-spacing? if we don't apply kerning 
any more, we go back to a): the space between A and V would seem bigger 
than the space between I and L.


In other words: if the kerning value stored in the font is correct, it 
should always be added to the letter spacing: it would make the characters 
overlap only when the letter-spacing alone would make normal characters 
overlap, and in this case this should be considered the desired output.


In the end XSL-FO has the letter-spacing property which users (and 
programs generating XSL-FO) can use to adjust kerning.


A little doubt concerning letter spaces: at the moment, a letter space is 
assigned to the preceding character. Is this correct? I don't remember 
any section in the specs stating about the ownership of letter spaces 
... I think that everything is simpler, from the point of view of both 
users and implementors, if each letter space is owned by the preceding (or 
following) formatting object, but this does not mean it is what the specs 
require!


An example: if we have the text WORD where each letter is a fo:character, 
the first three fo:character ATM have a letter space each, and the fourth 
has none.
All is ok as long as the fo:characters have no (or equal) letter-spacing, 
but what if each fo:character has a different letter-spacing property the 
ouput is different according to the fo:characters controlling letter 
spaces.


Regards
Luca


Re: DO NOT REPLY [Bug 37743] - exception: border-style (shorthand)

2005-12-05 Thread Luca Furini
First of all, thanks for your comments: I really tend to forget in a short 
time all the details concerning white space!


Manuel Mall wrote:

Glyphs are only allowed to be merged if they carry the same / matching 
set of property values. Personally I would not be concerned if we 
therefore limit that logic to within a LM. While it is possible that 
someone could write something like

fo:blockfo:inlinea/fo:inlinefo:inline#x0308;/fo:inline
and the a and #x0308; could be combined into an x00e4; IMO this is a 
pretty degenerated case.


Seems reasonable: so, we can delete glyph substitution from the list of 
things we must consider in this phase.


But, now I think of it, we must consider kerning too, so the list does not 
get any thinner!



my summary is:

a) We both seem to want the same outcome, that is add required features 
and at the same time get rid of some of the workarounds currently used.


Agreed.

b) We both agree that the character by character analysis is done at 
Line LM level.


Agreed.

c) Your initial thought is that the Line LM should then provide enough 
information to the LMs to generate their Knuth sequences while my 
initial thought is that the Line LM generates the Knuth sequences and 
provides enough information for the LMs to generate their areas.


If you agree with this summary may be we can concentrate on discussing 
the pros and cons of the two approaches mentioned in item c) above?


Ok, I'll send a new message soon!

Regards
Luca


Re: Indent Inheritance and Collapsing Border Model

2005-12-01 Thread Luca Furini

Jeremias Maerki wrote:


The first concerns indent inheritance [...]

So what I'd like to do is implement the alternative behaviour as a 
configurable option in the FO tree. The default would still be what the 
specification describes (see [1]), but users would be able to set a 
switch that would make FOP reset start-indent and end-indent to zero in 
cases where in the area tree a reference area boundary would be crossed 
(block-containers and table-cell, mainly).


I agree with the need to provide users what they expect, but I did not 
understand where this switch will be: in the configuration file (+1) or in 
the document itself as an extension property / element (not so 
enthusiastic about that)?


In the first case the file would be correct, only its rendering will be 
deliberately wrong: the user is aware that he is requiring a 
non-standard rendering *to the formatter*.


In the second the document itself would require a non-standard rendering, 
which only our implementation will provide; in other words, it seems to me 
that this solution would give the impression that the file itself is 
enough to achieve the expected result, while it is not.


Or maybe you were thinking of something else?

The second issue is about the collapsing border model. Currently, having 
an fo:table with no explicit border-collapse=separate results in a 
warning message in the log as well as frequent exceptions due to the 
fact that this border model not completely implemented. I would like to 
modify the FO tree in a way that a table always reports being in 
separate border model mode. The other idea would have been to change the 
default but I don't particularly like that approach because it breaks 
the spec. Obviously, this is only a temporary measure until the 
collapsing border model becomes usable.


I agree with you, I prefer the first option.

Regards
Luca


Re: svn commit: r345909 - in /xmlgraphics/fop/trunk: src/java/org/apache/fop/fo/flow/ src/java/org/apache/fop/layoutmgr/ src/java/org/apache/fop/layoutmgr/inline/ test/java/org/apache/fop/

2005-11-21 Thread Luca Furini

I wrote:


Implementation of hyphenation-ladder-count.


Just a couple of annotations:

- this implementation does not store any extra information inside the 
nodes: the algorithm checks wheter a break is ok or not using a for loop; 
if you prefer, I could change this so that the number of consecutive lines 
ending with a hyphen is stored inside the nodes, and the check takes a 
constant time


- the specs states that this property Specifies a limit on the number of 
successive hyphenated line-areas the formatter may generate *in a 
block-area*; so, if the value is 2 and a block creates 5 lines, the first 
4 lines could all end with a hyphen provided there is a break after the 
second one. This implementation would not create such a set of lines: 
anyway, the produced output still satisfies the condition; in other words, 
we check a more strict condition.


Regards
Luca


Re: Hyphenation

2005-11-16 Thread Luca Furini

Manuel Mall wrote:

Not sure what other committers and the PMC think but as a vote on the 
release has started I would suggest no further changes to the codebase 
unless agreed?


What I am saying is - by all means do the development but don't put it 
back into svn until after the release.


Ok, this seems a good idea.

Regards
Luca


Illegal property values

2005-11-16 Thread Luca Furini
While working on the implementation of hyphenation-ladder-count, I noticed 
that at the moment the property system can return illegal values 
coming from the fo file instead of the fallback value defined by the 
specs.


There are significant differences in wording between XSL 1.0 and 1.1: for 
example, concerning hyphenation-ladder-count 1.0 has (7.15.2):

integer an integer greater than or equal to 1

While 1.1 (7.16.2) reads:
number an integer greater than or equal to 1. If a zero, negative, or 
non-integer value is provided, the value will be rounded to the nearest 
integer value greater than or equal to 1


So, should the property be improperly set to -0.5:
- if we want to follow closely 1.0, we should stop with an error
- if we follow 1.1 we should continue using 1 instead, maybe with a 
warning message


There are other properties with a validity range and a fallback value: 
column-count, initial-page-number, column-number, number-columns-repeated, 
number-columns-spanned, number-rows-spanned, hyphenation-{push, 
remain}-character-count; only hyphenation-ladder-count does not have a 
fallback value in 1.0, so maybe this was just an oversight.
Note that the fallback value is different, in general, from the default 
value, as it is derived from the illegal value by rounding.


At the moment the layout process continues with the incorrect values, and 
this could create errors in several different places; for example a 
non-integer value would probably create an error if we assign it to an 
integer variable, a negative integer value could create an
IllegalArgumentException if we use it as the size of an array (this 
happens, for example, with a negative column-count) ...


Regards
Luca


Re: Hyphenation

2005-11-15 Thread Luca Furini

Manuel Mall wrote:

Hmm, just changed the value to 3000 (I think that's the value suggested 
in the article) and there is no change in hyphenation behaviour with the 
above mentioned example. That makes me a bit suspicious...


I traced the beheviour of the breaking algorithm applied to the first 
paragraph of the example (the one with 4 consevutive lines ending with a 
hyphen) and it seems to me that the algorithm works well: the chosen set 
of breaks has about 15000 demerits, while the existing three alternatives 
either have some more demerits and the same quantity of consecutive 
flagged lines or about 3 demerits.


It seems that out example, and in particular its first paragraph, 
perfectly follow Murphy's laws!


Tomorrow I should have some time to implement hyphenation-ladder-count and 
fix the penalty values for justified / unjustified text.


Regards
Luca


Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Luca Furini

Manuel Mall wrote:

So we end up with only two cases to consider: preserve white space and 
remove white space around a line break created by the Knuth algorithm.


1. Preserve white space: IMO in this case the space itself is actually 
not a break opportunity but there are now two break opportunities: one 
before the space and one after the space. That is a sequence like 
'abc#x20;def' is more like 'abc#x200b;#xa0;#x200b;def' or in a more 
readable notation 'abczwspnbspzwspdef'. That is our normal space 
becomes a non-breakable space flanked by zero-width spaces which 
represent the break opportunities. If this is correct the Knuth 
elements would look like:

 glue w=0
 box w=0
 pen +INFINITE
 glue w=space
 pen
 glue w=0
Is this sequence correct? The first and last glue represent the zwsp 
and are break opportunities. The box prevents the removal of the space 
if a break is created before the space. The penalty prevents the space 
to be considered as a break opportunity.
Of course as usual these sequences are further complicated in the 
absence of justification and in the presence of border/padding.


I like your idea of expanding a preserved space into zwsps and nbsp; 
this allows us to forget alignments and borders / padding as we just have 
to insert the appropriate elements for the non breaking space.


The sequence is very good, as it has a couple of interesting properties:

- it interacts with the surrounding elements just a single glue element

- if there are two (or more) consecutive, non-collapsed spaces the 
sequence has just 3 feasible breaks, not 4


However, I have a doubt: reading the Unicode document about line breaking, 
it seems to me that, regardless of the quantity of consecutive spaces, 
there is only *one* feasible break, after the last one (Unicode Standard 
Annex #14, section 2 Definitions, in particular the definition of 
direct break and indirect break)


--- begin quoted text ---

Direct Break - a line break opportunity exists between two adjacent 
characters of the given line breaking classes. This is indicated in the 
rules below as B ? A, where B is the character class of the character 
before and A is the character class of the character after the break. If 
they are separated by one or more space characters, a break opportunity 
also exists after the last space. In the pair table, the optional space 
characters are not shown.


Indirect Break - a line break opportunity exists between two characters of 
the given line breaking classes only if they are separated by one or more 
spaces. In this case, a break opportunity exists after the last space. No 
break opportunity exists if the characters are immediately adjacent. This 
is indicated in the pair table below as B % A, where B is the character 
class of the character before and A is the character class of the 
character after the break. Even though space characters are not shown in 
the pair table, an indirect break can only occur if one or more spaces 
follow B. In the notation of the rules in Section 6, Line Breaking 
Algorithm this would be represented as two rules: B ? A and B SP+ ? A.


--- end quoted text ---

I still have not read the document from top to bottom, and I could have 
misunderstood even the sections I read :-), but I think this point must be 
clarified before we continue.


Regards
Luca



Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Luca Furini

Manuel Mall wrote:

Luca wrote a longer response to this but my mail reader doesn't like the 
character set (is that topical or what?).


Sorry, it looks really horrible ... still don't know what went wrong, but 
I won't do it again! :-)


Any way at end Luca ask the question about the UAX#14 line breaking 
algorithm and its handling of spaces. My answer to that is:

a) Yes UAX#14 always breaks at the of a sequence of spaces
b) But is also says that it assumes any trailing spaces in a line are 
being removed
This conflicts with XSL-FO which can force spaces being retained 
therefore adjustments to the algorithm are necessary to cater for that. 
One possible adjustment is simply changing what is given to the 
algorithm as indicated above, ie sp becomes zwspnbspzwsp.


Ok, so back to your previous message:


2. Removal of white space: This is the current behaviour but it works
only for a single space and not for a sequence of spaces. Actually
because the algorithm removes leading glues/penalties it is mainly a
problem for trailing white space. I am not sure how to best tackle
this. What comes to mind is:

a) Do the same as for leading glues/penalties at the end of the line.
However I am not sure how tricky it would be to determine the boundary
because any 'blocking boxes' (see 1. above) are only placed
before but
not after elements. This options suffers from the problem that it will
not remove leading/trailing white space across inline boundaries with
border/padding as these generate zero width boxes to block removal of
the glue elements for the border/padding.



b) Do not generate individual Knuth sequences for each white space
character but instead collect all consecutive white space and create
one glue-penalty sequence for it. Again I am uncertain of the
consequences of doing that. To do that correctly we would need to
collect white space across inline boundaries. This firstly breaks the
current getNextKnuth approach which assumes each LM can generate its
sequences without knowledge of its neighbours. It would also break the
current area info structures as a single Knuth element could now refer
to text snippets from different LMs.


I'm not sure I follow you in all the details of white space handling and 
here we have borders too ... :-)


I like b) most: after all, this is somewhat similar to the space 
resolution, as we have interactions between spaces coming from different 
nodes, and it's difficult to have each LM decide on its own. And I think 
we could find a way to keep the 1-1 relationship between AreaInfo objects 
and Positions.


I have tried to play with the elements, and here are a few results: I hope 
they can help!


At the moments, the sequence for a single space with borders and padding 
is:


1  glue w=endBP
2  penalty w=0
3  glue w=(spaceIPD - endBP - startBP)
4  box w=0
5  infinite penalty
6  glue w=startBP

total width = spaceIPD
if break at #2 = endBP / startBP

If we have two (or more) spaces, we could use the sequence:

1  glue w=endBP
2  penalty w=0
3  glue w=(- endBP - startBP)
4  glue w=spaceIPD1
5  glue w=spaceIPD2
6  box w=0
7  infinite penalty
8  glue w=startBP

total width = spaceIPD1 + spaceIPD2
if break at #2 = endBP / startBP

Glues #4 and #5 have a Position pointing to different AreaInfo objects 
(from different LMs). This should solve (?) the case of 
ignore-if-surrounding.


If white-space-treatment is ignore-if-after, and we have two consecutive 
spaces we could use the sequence:


1  glue w=endBP
2  penalty w=0
3  glue w=(spaceIPD - endBP)
4  penalty w=0
5  glue w=(spaceIPD - startBP)
6  box w=0
7  infinite penalty
8  glue w=startBP

total width = 2 * spaceIPD
if break at #2 = endBP / startBP
if break at #4 = endBP + spaceIPD / startBP

With three or more consecutive spaces:
1  glue w=endBP
2  penalty w=0
3  glue w=(spaceIPD - endBP)
4  penalty w=0
5  glue w=spaceIPD
6  penalty w=0
7  glue w=(spaceIPD - startBP)
8  box w=0
9  infinite penalty
10 glue w=startBP

total width = 3 * spaceIPD
if break at #2 = endBP / startBP
if break at #4 = endBP + spaceIPD / startBP
if break at #6 = endBP + 2 * spaceIPD / startBP

I did not find a sequence for ignore-if-before yet ...

Regards
   Luca


Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-28 Thread Luca Furini

Manuel Mall wrote:


 But we need to know which spaces can be adjusted, and which cannot.
 If we don't wont to duplicate the logic for the space recognition,
 the SpaceAreas must simply have a boolean value stating whether the
 space is adjustable, so that the renderers won't need to look at the
 space and decide.

I don't get that point. Isn't it enough for the renderer to know the 
offset for the area in question? What additional decisions would the 
renderer make based on the adjust flag? Or do you mean we still have 
the twsAdjust on the TextArea and the offset is only relative to 
twsAdjust? Do we really gain anything with that instead of making the 
offset the corrected twsAdjust value?


At the moment we still use the twsAdjust value, and the individual offset 
would be an additional adjustment. Maybe there is little gain, but when 
the font is not multi-byte this saves us from setting the offset on each 
adjustable SpaceArea and using it in the renderer. It's not much, both in 
terms of time and output length: but if there is an easy way to adjust all 
the spaces at once ... why should we do another way? :-)



[...]

 So, what if we rename offset - spaceAfter? It seems to me that we
 are here speaking of the same thing using two different names. :-)

Fair enough, I agree we do.


Good!
We just have to reach an agreement on this last detail, and I'll implement 
the changes.


Regards
Luca


Re: White space handling Wiki page

2005-10-28 Thread Luca Furini

Manuel Mall wrote:

Side note: FOP doesn't quite do the same internally, i.e. a character 
explicitly specified using fo:character.../ is handled separately from 
'plain text'. If someone would write a style sheet which does a 
transform of every character into a fo:character / object and would 
feed the output to FOP the formatting results would be lets say VERY 
DISAPPOINTING. Actually something like: fo:block 
background-color=yellowword1fo:character character= 
/fo:character character=  /word2fo:character character= 
/word3fo:character character= //fo:block currently causes an 
exception!


This is a problem of the whitespace-related code, but anyway the 
CharacterLM always creates a sequence of element corresponding to a 
non-space character, so the only feasible breaks recognized by the 
algorithm would be the hyphenation points inside the words ...


I think that just as TextArea and Character both extend an 
AbstractTextArea, TextLM and CharLM should have a common super class 
holding the createElementsFor*() methods. It would not be necessary to add 
a SpaceArea or a WordArea child to a Character area, anyway (but we could 
decide to do it anyway just for analogy).


Regards
Luca




Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-27 Thread Luca Furini

I wrote:


Manuel Mall wrote:

 There is no need to expose creation of the Space/Word areas directly 
 to TextLayoutManager either. TextArea could easily expose an addWord 
 and an addSpace method instead of the monolithic setText. In the end 
 it probably boils down to me arguing that the setText logic currently 
 in TextArea IMO should be in TextLayoutManager (and probably based on 
 its data structures) because it is an operation closely coupled to 
 layout and not to areas.


Ok.


Done:
  http://svn.apache.org/viewcvs.cgi?view=revrev=328882

I added a boolean attribute in SpaceArea that is true for adjustable 
spaces (at the moment it is not used, but I will fix it soon).


At the moment the offset in SpaceArea and WordArea are unused, but this is 
how I think they could be used: if, because of the rounding in the 
adjustment computation, the applied adjustment is different than the 
needed one, the TextLM should distribute this difference (a few 
millipoints) among the SpaceAreas and / or WordAreas, setting their 
offset.


The renderers will use this according to their own adjustment rule: for 
example the PDFRenderer would add it to the text adjustment if the 
character is multibyte.


The offset could come in handy for the cjk support (bug 36977): in this 
case there are no adjustable spaces, and if text is justified all the 
difference between line width and unadjusted character width could be 
handled modifying the offsets of some special characters.


Regards
Luca




Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-26 Thread Luca Furini

Manuel Mall wrote:

I have a question on this. You break in TextArea the text into words 
based on CharUtilities.isAnySpace. Is this guaranteed to be consistent 
with the breaking and adjustment calculations in TextLayoutManager? I am 
concerned we may be using different rules for word breaking in different 
places.


As far as consistency is concerned, I agree with you: the handling of the 
different kinds of spaces (breaking, non-breaking, fixed width, ...) is 
still quite incomplete and dispersed over different classes. Just to add 
another example, the CharacterLM implicitly expects its character to be 
a non-space character and has its own lines of code concerning the 
creation of the elements, while it could share the methods already called 
by the TextLM.


Having a single, centralized class taking care of the breaking (be it a 
Java utility class or a Fop one) and a single, shared method implementing 
the creation of the elements would surely increase consistency and 
clarity.


Somehow it doesn't feel right to me that TextLayoutManager does all the 
breaking and calculations and then we give the whole chunk to TextArea 
and it breaks it again using a possibly different algorithm but still 
using the adjustment value calculated by TextLayoutManager.


When I was trying to fix bug 36238 I initially started modifying 
TextLM#createTextArea(), using the AreaInfo objects to create WordAreas 
and SpaceAreas, but I then decided to move the string splitting inside 
TextArea because:


1) if WordAreas and SpaceAreas are not directly created by the LMs, there 
is no need to change a single line of code inside the classes creating 
TextAreas; this is not a real reason supporting the choice, just an 
handy consequence of it;


2) if TextArea still provides a getText() method, the renderers are not 
forced to render the text word by word and space by space if their word 
spacing treatment is not affected by multi-byte characters; but once 
again, this is not a real reason as we could provide this method anyway;


3) although both SpaceArea and WordArea hava an offset attribute it is 
ATM not used, so these areas does not carry any formatting information; 
their only purpose is to highlight spaces, thus allowing some specific 
renderer to handle them correctly regardless of their encoding; in other 
words, we are not losing braking and calculations, we simply do not need 
them anymore as we already know exactly which text will be placed in each 
line, and how wide it will be once it's correctly adjusted;


4) the text that will be placed in a line cannot be directly taken from 
textArray (in the TextLM), and the string str should be used instead 
anyway, as it may be different from the concatenation of the single pieces 
of text; at the moment the only difference concerns the hyphenation 
character - added at the end of the line, but I suspect that in 
different languages there could be other differences; so, we cannot simply 
create a WordAreas for each AreaInfo object.


So, if you find it strange to break the text, put it together and split it 
again, me too! :-) But this initial feeling disappeared when I realized 
that the final splitting does not involve breaking in its proper sense, 
but just classification of characters.


This is why I did what I did; if I did not manage to convince you ... you 
can try and convince me! :-)


Regards
Luca




Re: DO NOT REPLY [Bug 36238] - text-align=justify doesnt' work on custom fonts

2005-10-19 Thread Luca Furini
Yesterday I added a couple of comments concerning this bug; at the 
moment I haven't received the bugzilla email yet, so here is a 
copy-and-paste of the last message.


I added a comment after the copied text, so this message would not be 
completely useless even if you received the original one! :-)


 --- Additional Comment #8 From Luca Furini  2005-10-18 12:53

Quotation from the pdf reference, version 1.6, section 5.2.2 Word spacing:

  Word spacing is applied to every occurrence of the single-byte character code
32 in a string when using a simple font or a composite font that defines code 32
as a single-byte code. It does not apply to occurrences of the byte value 32 in
multiple-byte codes.

So, it seems that at least we have found where the problem lies ... anyone has
an idea how to solve it too? :-)

 ---

At the moment, my only idea about how fixing this is go back to the 
creation of several text areas, one for each word or space: so the 
multibyte space character could be converted to the single-byte space, or 
we could leave it as it is and forcing the adjustment modifying the ipd 
of the area created for a space.


A disadvantage of this solution would be the big increase in the area tree 
size.


An advantage could be the possibility to get rid of errors due to the 
adjustment rounding: at the moment the letter space can lead to an error 
of the order of the number of letter spaces, as the adjustment is rounded 
up to the nearest millipoint and is applied to all the letter spaces in a 
line. Having distinct text areas for each word, we could correct this 
error setting appropriately each area ipd.


Regards
Luca


Re: svn commit: r321084 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr/inline/LineLayoutManager.java

2005-10-14 Thread Luca Furini
Fixing a ClassCastException due to the incorrect pattern of elements 
representing a space checked when there are inline borders and padding.


The testcase inline_border_padding_block_nested_2.xml stil does not pass: 
there is a failing check concerning ipda. But at least there are no more 
exceptions! :-)


Regards
Luca


Re: Inline border / padding and nested blocks

2005-10-10 Thread Luca Furini

Manuel Mall wrote:

inline_border_padding_block_nested.xml. If you run the test case as is 
you get a Expect inline sequence as first sequence when last paragraph 
is not null message.


The first message refers to the first block in the testcase: I think this 
has something to do with the correct mixing of block and inline 
sequences, as the content of the inner block is placed in the first line, 
while it should be in the second.


The output should be:

  Before inline
  starting with a block
  after block After inline

but we get

  starting wit a block
  Before inine after block After inline

Note that the text before and after the inline (containing the nested 
block) appear in the same line, and this means their elements ended up in 
the same sequence, while they should be in two different sequences.


I'm going to look at what happens in detail ...

If you comment everything out and uncomment the last block you get a 
ClassCastException on a Knuth element.


This happens during LineLM.removeElementsForTrailingSpaces(): as you wrote 
some time ago, at the moment when the LineLM meets a glue element at the 
end of a sequence it could wrongly deduce it represents a trailing space, 
while it represents borders / paddings.


I'm going to look at the possible patterns that the elements for border 
and padding can have, and fix the method.


Regards
Luca




Re: Inline border / padding and nested blocks

2005-10-10 Thread Luca Furini

Manuel Mall wrote:

Is that actually conceptually the right thing to do, that is removing 
the trailing spaces before the end of a block as part of the Knuth 
handling?


For leading spaces it is done somewhere completely different (and 
currently in the same piece of code it is done incorrectly for embedded 
spaces).


I'm not sure it is the best place to do it, although I think that before 
the breaks are computed trailing spaces should exist no more: otherwise, 
the content width would take into account the width of these spaces too, 
and right / center alignment could be incorrect.


Moreover, a glue just before the elements appended by the LineLM could be 
a feasible break, and this would create an empty page after the last one 
with some content.


In other words, that removal is there as it could not be performed any 
later: but the sooner we get rid of the trailing spaces, the better! :-)


I have a picture in mind with all white space handling done as part of 
the layout (area tree building) but before the actual Knuth sequences 
are constructed. But that's only a rough idea driven by the description 
of white space handling in the 1.1WD.


Would you like to share it with us? I always find the specs quite obscure 
as far as white space handling is concerned, so your explanation could 
really be of great help!


Regards
Luca


Re: Inline border / padding and nested blocks

2005-10-07 Thread Luca Furini

Manuel Mall

I would appreciate if you could please have a look at test case 
inline_border_padding_block_nested.xml. If you run the test case as 
is you get a Expect inline sequence as first sequence when last 
paragraph is not null message. If you comment everything out and 
uncomment the last block you get a ClassCastException on a Knuth 
element. For both issues I am a bit out of my depth and hope you could 
help.


First of all, my compliments for your wonderful work!

I'll surely have a look at what happens, although I could have no time to 
do this until monday.


Regards
Luca


Re: Knuth algorithm problem

2005-10-06 Thread Luca Furini

Jeremias Maerki wrote:


I think I've just stumbled over a problem in the Knuth algorithm.


I'm going to see what happens ...

Regards
Luca




Re: svn commit: r306656 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr: BreakingAlgorithm.java PageBreakingAlgorithm.java

2005-10-06 Thread Luca Furini
Fixing a bug reported by Jeremias affecting the handling of glue and 
penalty elements after a break when the algorithm restarts.


Now it should be ok. A nasty little bug, anyway ...

Unfortunately, I had to duplicate a few lines (a for loop looking for glue 
elements after a feasible break): the point is that there are three 
different variables (the width, stretch and shrink) that must be modified 
during this loop.


I'm going to see if there is some possible refactoring of this piece of code.

Regards
Luca


Re: Maintainability

2005-09-30 Thread Luca Furini

Peter B. West wrote:

Thanks to Luca for his (perhaps entirely co-incidental) posting to the 
Wiki.


Well, not entirely co-incidental! :-)

I started writing the page some time ago, but never found the time to 
finish it: your message made me think I really couldn't put this off any 
longer, so I added a few things and posted what was ready. I'd try and add 
the missing parts as soon as possible.


Comments, questions, suggestions, and especially additions :-) are most 
welcome!


Regards
Luca




Re: Maintainability

2005-09-30 Thread Luca Furini

Peter B. West wrote:

There seem to be some misapprehensions about what you are attempting; 
perhaps they are mine, so please clarify this.  As I understand it, the 
mature, well-documented technology is the line-breaking, as in Breaking 
Lines Into Paragraphs.  Using this model for page-breaking is something 
that has been speculated about, in particular by Plass.  However, in 
implementing this, you and the others are breaking new ground.


If this is the case, then it is quite inaccurate to describe the 
page-breaking as mature, well-understood, well-documented and 
well-behaved technology.


Is that fair?


As Manuel has quickly answered, the box-penalty-glue model can be applied 
to both line breaking and page breaking, and this is already clearly 
stated in the cited article.


In the Texbook you can find out that horizontal lists (representing the 
content of paragraphs) and vertical lists (representing content in the 
block progression direction) are made of the same elements: boxes, glues 
and penalties.


So, I think we can surely state that the model suits the page breaking 
problem too.


What Tex does not is performing Knuth's breaking algorithm in order to 
produce page breaks: it performs instead a simpler algorithm. But this is 
due to the resource limits existing at the moment when Tex was devised, 
and in the cited paper Knuth explicitly says so.


The page breaking problem has some more difficulties, concerning objects 
whose placement does not follow the main flow, for example floating 
figures; in this case, the difficulty is the other side of freedom (the 
position of these objects has little constraints) and comes from trying to 
place them in the best possible way, which could lead to high 
computational complexity: should this be too much, it would be enough to 
use a simpler strategy instead, for example placing them in the first page 
where they fit, and the problem would be solved.


So, I think we can say that the algorithm can be applied without any 
concern to page breaking too.


Regards
Lucat




Re: Another page-related question: page-position=last

2005-09-28 Thread Luca Furini

Jeremias Maerki wrote:


What is the expected output?


In this case it has to generate a blank page IMO.


Oh, right, I did not think of an empty page! :-)


The problem is with the page x of y hack that won't work like this if
the last empty block ends up on the second-to-last page. [...]

What about the following approach?

Run the breaker without special last-page handling, then inspect the
allocated BPD for the last part. If it fits into the last page, just
exchange the page-master (*) and paint it there. If it doesn't fit,
paint it using the non-last page-master and add a blank page with the
last page-master. If there's a box w=0 at the end of the element list,
force a new part and paint that on the last page to handle the page x
of y case.


I think this would work with my idea too: in this case, if the last empty 
block and the difference in page bpd (that cannot be parted) do not fit in 
the non-last page under construction, they would be placed in a new page; 
so, a page-number-citation pointing to the empty block would return the 
last page-number. This would avoid the need to exchange page-masters, and 
to have a special handling for zero-width box at the end of the sequence.


Regards
Luca


Re: Another page-related question: page-position=last

2005-09-27 Thread Luca Furini

Jeremias Maerki wrote:

It's an interesting idea. However, I suspect this will probably not be 
necessary. We should be able to make the breaker clever enough to handle 
this particular case.


When the page bpd depends on the page-masters, things becomes very 
strange. Not only it's difficult to implement the page-master choice, but 
even to understand what should be the expected result! :-)


For example: let's suppose the breaker is working, and it has to place the 
last 25 lines of a page-sequence. The page-master for the last page has a 
bpd allowing no more than 20 lines, while the other page-masters can 
contain up to 30 lines.


What happens? If the breaker starts building a last page it soon 
realizes that it would not contain all the remaining content, so it would 
be no more a last page. But if it starts building a non-last page, it 
reaches the end of the content, and has to turn it into a last page, 
which is impossible.


What is the expected output? The only way I see to satisfy the property is 
to create two more pages: one non-last page, partially empty, with less 
than 25 lines (24 or fewer, if there are keeps, widows or orphans) and a 
last page with the remaining lines.


This sort of problems happens only if the last page is smaller than the 
previous ones: otherwise, the breaker can always try to build a non-last 
page, eventually moving all its content into a last page.


Now I think of this ... an idea, that could work at least when the 
non-last pages have the same bpd and the last page a smaller one, could be 
to modify a little the elements appended at the end of the sequence, so 
that they have a width equal to the difference (nonLastBPD - lastBPD). 
This way, the last page created by the breaker will have an apparent width 
of nonLastBPD, but the content placed inside it will have an overall bpd 
equal to

nonLastBPD - (nonLastBPD - lastBPD)
  = lastBPD

What do you think?

Regards
Luca




Re: Indefinite page-width / page-height

2005-09-26 Thread Luca Furini

Andreas L Delmelle wrote:

Currently, I have solved this locally by creating the pageVP with the 
indefinite dimension set to Integer.MAX_VALUE.

The only things I'm still looking for are ways to:
a) retrieve the accumulated content-height/-width (or: the difference 
between the initial page-height/-width and the content-height/-width up 
to that point)


The difference is stored into PageBreakPosition.difference

I'm guessing the place where all this should happen is 
PageSeqLM.finishPage()


Maybe it's easier to put this in PSLM.PageBreaker.finishPart(): it already 
has a PageBreakPosition parameter, so it should be enough to add something 
like getCurrentPV().setBPD(Integer.MAX_VALUE - pbp.difference).


HTH

Regards
Luca



Re: undefined page length

2005-09-20 Thread Luca Furini

Andreas L Delmelle wrote:

BTW: Is it a correct assessment that implementing this should turn out 
to be far simpler than fixed page-sizes? IIC, theoretically, the whole 
page-breaking algorithm can be ignored for indefinite page-heights. 
getAvailableBPD() would always return, say, Integer.MAX_VALUE?


I don't think the breaking algorithm can be completely ignored, but it's a 
good idea to have getAvailableBPD() return an almost infinite value.


Once the PageBreakingAlgorithm has created the single PageBreakPosition, 
it would be possible to use the stored difference in order to set the 
correct page height (otherwise the page would have height = 
Integer.MAX_VALUE even if it contains just a few lines).


Regards
Luca




Build error?

2005-09-19 Thread Luca Furini

Hi all.

I'm noticing a strange problem: fop builds correctly, but then it seems it 
is not working at all.


I'm using it from the command line under win xp, and even if I don't get 
any run time exception no output file is created. Launching fop with no 
parameters, or with wrong parameters (missing files ...) does not create 
any error: simply, nothing happens.


I have compiled fop on two different computers, so I don't think this 
is a local configuration problem.


Hasn't anyone else noticed this?

Regards
Luca


Re: wrap-option property

2005-09-16 Thread Luca Furini

Jeremias Maerki wrote:

wrap-option is one of those few properties which work in 0.20.5 but are 
not yet available in FOP Trunk. Luca, what do you think how difficult it 
would be to implement it at least for, let's say, fo:block? I imagine it 
would suffice to trick the breaker into not choosing any break 
possibilities except at the end of the sequence.


Yes, it seems a very good idea: just an additional boolean parameter for 
findBreakingPoints(), similar to hyphenationAllowed. Or we could use just 
a single int instead of two booleans: a parameter whose value could be set 
using three constants, for example ALL_BREAKS, NO_HYPHENATION, NO_WRAP.


Maybe it could be even easier: a LineBreakPosition could be created 
without even performing the line breaking algorithm, as we alredy know we 
will create just a line, an which will be the indexes of the first and 
last element. But maybe this would prevent us from knowing useful 
information created by the algorithm (difference, indent, ...).


I'm going to work on this immediately.

I think we will need something similar in the StaticContentLM and the 
BlockContainerLM so overflow can be handled better. At the moment, only 
the first part until the first break point found by the breaker is 
properly painted. Afterwards, the BCLM simply adds the additional parts 
but this can lead to unexpected results as I have seen in one document 
already.


Sorry, I don't quit get what you mean ... what are these unexpected results?

Regards
Luca





Re: baseline-shift and KnuthInlineBoxes

2005-09-16 Thread Luca Furini

Manuel Mall wrote:


if we have a baseline-shift, eg.

some Xfo:inline font-size=smaller 
baseline-shift=super2/fo:inline ...


how is that intended to be modelled with respect to the lead,height, and 
middle values to be stored in the created KnuthInlineBoxes for the 
fo:inline?


I think that more (or different) information needs to be stored in the 
KnuthInlineBoxes in order to fully implement the properties concerning the 
vertical positioning of objects.


Lead, total and middle are only enough to handle vertical-align = top, 
bottom or middle; anyway, maybe three attributes could be enough: one 
identifying the alignment baseline (alphabetic, ideographic, 
text-before-edge, ...) and two specifying the box heigth above and below 
this baseline.


The LineLM should look at these values when creating the lines: each box 
height will be interpreted differently according to its baseline: I think 
this will be the tricky part of this work!


HTH, even if it' not much :-)

Regards
Luca




Re: svn commit: r280854 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java

2005-09-14 Thread Luca Furini

I wrote:

Correct handling of the combination of hyphenation and text-align = 
center, left or right.


At the end, I found out that this was not the same problem as bug 36533, 
but another bug specifically concerning the elements created to represent 
hyphens.


I think that Manuel has been the first person ever testing hyphenation 
together with non-justified text, thus awaking this sleeping bug! :-)


There is still a detail that has not yet been fixed: the correct handling 
of characters that can be used as break points (for example a / 
character in the middle of a long url, that could be used as an emergency 
break). I'm going to fix that too, I just wanted to commit this 
correction as soon as possible in order to avoid run time exceptions.


Regards
Luca


Re: svn commit: r280520 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java

2005-09-13 Thread Luca Furini

I wrote:

Factorized the creation of the elements in TextLM: now both getNextKE() 
and getChangedKE()  call the same methods createElementsForASpace() and 
crateElementsForAWordFragment().


This should definitively solve bug 36533.


Besides removing duplicated lines and inconsistencies, I hope this could 
help making this part of code a little more readable and easily 
understandable.


I'm going to see if these methods can be moved up to the LeafNodeLM, thus 
being available for all subclasses.


Manuel, I hope you don't have to spend a lot of time merging this changes 
with the work you are doing; I think you could add further parameters to 
createElementsForASpace(), to pass the variables you need for borders and 
padding.


Regards
Luca




Re: fo:inline bpd

2005-09-12 Thread Luca Furini

Manuel Mall wrote:

yes, that is an option. What I am unsure about here is that the 
children, typically text areas, do not take the line spacing into 
account when reporting their bpd, that is the usually 10% space above 
and below the character. So what is the correct bpd for an fo:inline 
which has text area children: is it just the max bpd of its children or 
is it max bpd plus any line spacing settings from its parent?


Oh, yes, the half-leading trait ...

If I understand correctly the specs (4.5 Line areas) this line spacing 
must be added to the bpd of each inline area too. As it is the same for 
all inline areas, it could be stored into the LayoutContext by the LineLM.


Regards
Luca


Re: Space-resolution doesn't work

2005-09-09 Thread Luca Furini

Jeremias Maerki wrote:

I'll start from scratch to come up with a better strategy of 
implementing these rules. I'll probably start by documenting a few cases 
in the Wiki and try to develop the right element list for them. After 
that I'll try to find out who exactly to implement everything. Help is 
welcome.


I think spaces and keeps are quite similar and very connected: in both 
cases, the constraints can invole formatting objects that are not at the 
same depth in the tree.


So, my idea for handling space resolution is tho have a LM ask its 
children about their spaces, and create the necessary elements (while at 
the moment each LM creates elements for its own spaces).


For example, if we have this LM tree

   Outer BlockLM
 |
+++
|||
BlockLM 1BlockLM 2BlockLM 3
 |
  +--+-+
  ||
  BlockLM ABlockLM B

BlockLM1.getNextKnuthElements() would return to the outer BlockLM only the 
elements representing its block content, without any space.


In order to decide which elements it has to create, the outer BlockLM 
could have some lines like:


(currentChild = BlockLM 1
 nextChild = BlockLM 2)

space1 = currentChild.getSpaceAfter();
space2 = nextChild.getSpaceBefore();
if (this.mustKeepTogether()
|| currentChild.mustKeepWithNext()  !nextChild.hasBreakBefore()
|| !currentChild.hasBreakAfter()  nextChild.mustKeepWithPrevious) {
// there cannot be a break between the two children,
createElementsForSpace(resolve(space1, space2, false, false));
} else {
// there can be a break between the children
createElementsForSpace(resolve(space1, null, false, true),
   resolve(null, space2, true, false),
   resolve(space1, space2, false, false));
}

where:

- the method createElementsForSpace() can have a single space parameter
  (returning a sequence that has no feasible breaks [1]) or three
  different spaces parameters (returing a sequence with a feasible break
  [2]);
- resolve takes two spaces and two booleans, signalling if the space will
  be at the beginning / end of a page (as this affects the resolved space)
- getSpaceAfter() would be something like
   return resolve(this.spaceAfter, lastChild.getSpaceAfter(), false, false);
  vice-versa, getSpaceBefore would be
   return resolve(this.spaceBefore, firstChild.getSpaceBefore(), false, false);
  (a similar mechanism could be used for keeps)

but I'm not sure that adding two spaces at a time would always give the 
same result.


Otherwise, we could follow the implementation of keeps, using the 
LayoutContext to keep track of the spaces met and not yet converted into 
elements.


Regards
Luca

[1] this would be a simple glue element, preceded by a penalty with value
= inf

[2] maybe a sequence glue - penalty - glue - box - PENALTY - glue,
with
 glue #1 is the resolved space after block 1 if a break occurs
 glue #3 is the resolved space before block 2 if a break occurs
 penalty is a feasible break
 PENALTY forbids a break
 glue #3 is the difference between glue #1 + glue #3 and the resolved
 space if there is no break



Re: svn commit: r279551 - in /xmlgraphics/fop/trunk: src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java test/layoutengine/testcases/wrapper_text-transform_1.xml

2005-09-08 Thread Luca Furini

Manuel Mall wrote:

this is my code after integrating your patch to add the knuth elements 
for line end / start border/padding for the common justify=start or 
end case. What I am getting now is a space at the beginning of each 
line break!:


if (lineStartBAP != 0 || lineEndBAP != 0) {
sequence.add
(new KnuthGlue(lineEndBAP, 0, 0,
new LeafPosition(this, -1), true));
sequence.add
(new KnuthPenalty(0, 0, false,
   new LeafPosition(this, -1), true));
   sequence.add
   (new KnuthGlue(wordSpaceIPD.opt - (lineStartBAP + lineEndBAP),
   wordSpaceIPD.max - wordSpaceIPD.opt,
   wordSpaceIPD.opt - wordSpaceIPD.min,
new LeafPosition(this, -1), true));
sequence.add
(new KnuthInlineBox(0, 0, 0, 0,
  notifyPos(new LeafPosition(this, -1)), true));
sequence.add
(new KnuthPenalty(0, KnuthElement.INFINITE, false,
   new LeafPosition(this, -1), true));
sequence.add
(new KnuthGlue(lineStartBAP, 0, 0,
   new LeafPosition(this, vecAreaInfo.size() - 1), false));
} else {
...
}


The LeafPosition(this, vecAreaInfo.size() - 1) (the Position containing 
the index of the AreaInfo objects storing information about the space) 
should be the one that is discared if a line break happens: i.e. the 
second one instead of the third.


With this change, this sequence should be correct for a space in justified 
text.


With left- / right-aligned text the overall stretch and shrink of the sequence 
should not be changed, so the sequence should be:
  sequence.add
  (new KnuthGlue(lineEndBAP, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 
0,
 new LeafPosition(this, -1), true));
  sequence.add
  (new KnuthPenalty(0, 0, false,
new LeafPosition(this, -1), true));
  sequence.add
 (new KnuthGlue(wordSpaceIPD.opt - (lineStartBAP + lineEndBAP),
- 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH,
0,
new LeafPosition(***, false));
  sequence.add
  (new KnuthInlineBox(0, 0, 0, 0,
  new LeafPosition(this, -1), true));
  sequence.add
  (new KnuthPenalty(0, KnuthElement.INFINITE, false,
new LeafPosition(this, -1), true));
  sequence.add
  (new KnuthGlue(lineStartBAP, 0, 0,
 new LeafPosition(this, -1), true));

With centered text the combined sequence should be:
  sequence.add
  (new KnuthGlue(lineEndBAP, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 
0,
 new LeafPosition(this, -1), true));
  sequence.add
  (new KnuthPenalty(0, 0, false,
new LeafPosition(this, -1), true));
  sequence.add
 (new KnuthGlue(wordSpaceIPD.opt - (lineStartBAP + lineEndBAP),
- 6 * LineLayoutManager.DEFAULT_SPACE_WIDTH,
0,
new LeafPosition(***, false));
  sequence.add
  (new KnuthInlineBox(0, 0, 0, 0,
  new LeafPosition(this, -1), true));
  sequence.add
  (new KnuthPenalty(0, KnuthElement.INFINITE, false,
new LeafPosition(this, -1), true));
  sequence.add
  (new KnuthGlue(lineStartBAP,
 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0,
 new LeafPosition(this, -1), true));

The Position marked *** should be a LeafPosition(this, vecAreaInfo.size() 
- 1); as it is in the element more connected with the real space (if 
this element is ignored, the space is too) maybe it is this one that must 
be notified.


[from your other message]
I am also unsure what the correct knuth element sequences are in the 
case of the forced line break and for hyphenation.


A forced line break should not be very different from the real end of 
the inline, so I think it should be enough to add a box/glue element 
(according to the conditionality [1]) whose width is lineEndBAP before the 
penalty. In this case, the next returned sequence should start with the 
elements for the initial border and padding.


As per the hyphenation, I think we could use the same sequence created for 
a space (according to the alignment), but with the first penalty (the 
second element) having the width of the -.


While answering your message I noticed that there are some inconstitencies 
in the TextLM: for example, the LineLM.DEFAULT_SPACE_WIDTH is not used 
everywhere it should ... I'll try and find some time to fix them.


I hope I did not answer you too late, otherwise ... tomorrow is another 
day :-)

The time difference between Italy and Australia can hinder communication!

Regards
Luca

[1] in effects, as a preserved 

Re: e-g with padding and borders

2005-09-06 Thread Luca Furini

Manuel Mall wrote:

Next problem: border conditionality - how do I model that with the Knuth 
approach? At the time I add the Border/Padding start/end boxes we don't 
have line breaks so they really only cover the .conditionality=discard 
case. How do I tell the algorithm to leave enough space at the end of 
each line (and the beginning of the next line) for the borders (in the 
case of .conditionality=retain)?


The sequence of elements representing the inline content starts and ends 
with a box [1].


Adding another box at the beginning and at the end of the sequence 
implements retain, as a line break is never allowed to separate two 
adjacent boxes: so, the left border and padding will always be in same 
line as the first piece of content, and the breaking algorithm will always 
reserve enough space.


In order to implement discard, glue elements must be used instead: these 
elements are discarded if they are chosen as a line break or they are 
adjacent to a line break, and in this case borders and padding will not be 
painted.


I think that a single box or glue element could be created, representing 
both border and padding, unless the conditionalities of these properties 
can be different: for example, if it were possible to have 
border-start.conditionality = discard and padding-start.conditionality = 
retain two distinct elements should necessarily be created.


Regards
Luca

[1] Or, better, everything should work well if the first and last elements 
are boxes. Should there be spaces at the beginning and at the end of the 
inline having borders, they should be handled as non-breaking spaces, in 
order to avoid a break between the start border and the first word, or 
between the last word and the end border.





Re: Line LM, Inline LM and LAST_AREA

2005-09-06 Thread Luca Furini

Manuel Mall wrote:

But if we have a long fo:inline stretching multiple lines this seem to 
give the wrong results from the Inline LM perspective. For example if 
the fo:inline finishes in the middle of a line followed by more text the 
Line LM will not set the LAST_AREA flag when calling addAreas on the 
Inline LM as there are more areas on the line. Therefore the Inline LM 
thinks its not done with yet although it is and the reverse is true on 
the first line of a multi-line inline.


The LineLM.addAreas() method creates a line at a time (a line for each 
LineBreakPosition), and asks its children to add their inline areas for 
the line area being created.


It sets the LAST_AREA flag if the child LM is the one that created the 
last element placed in this line: for each line, there is one and only one 
child LM that receives a LayoutContext with this flag set, unless there 
are bugs :-)


If the content of an inline is divided among several lines, the method 
InlineLM.addAreas() will be called once per line, and all the times (but 
the last) it will have the LAST_AREA flag on.


Some time ago there was a thread about a similar subject [1]: the problem, 
then, was the opposite, i.e. to find out which is the last area generated 
by a LM, regardless of line breaks.


I think there is a bit of ambiguity in the names: at the moment, the 
LAST_AREA flag signals to a LM that it is adding the last inline area in a 
line, or the last block area in a page, but this can cause confusion with 
the is-last area trait described by the specs (4.2.2 Common traits). Maybe 
we can find out a more significant and univocal name.


Regards
Luca

[1] Markers: Determining the last generated area for a LM, 
http://nagoya.apache.org/eyebrowse/ReadMsg?listId=63msgNo=11296





Re: e-g with padding and borders

2005-09-06 Thread Luca Furini

Manuel Mall wrote:


These two paragraphs confuse me - sorry. My understanding was:

discard = start/end borders/padding only at the start and end of the 
whole fo:inline


retain = as discard plus start/end borders/padding on the start and end 
of every line the fo:inline spans.


Sorry, you are completely right, I did not understand you were referring 
to the extra borders needed around a line break.


What we need is one or more elements whose overall behaviour is this:
- they represent a space (or another legal break point)
- if they are not used as a break, they behave like a normal space (or
  like a not-used hyphenation point)
- if they are chosen as a break, they must add something both at the end
  of the line they end, and at the beginning of the next line

This is quite similar to the behaviour of the sequence of elements 
representing a space in a centered text (in the 
TextLM.getNextKnuthElements() method); so, in this case we could use:


1  glue width = border/padding at the end of the line = A
2  penalty  width = 0, value = 0
3  glue width = space.opt - (A + B),
stretch = space.max - space.opt
shrink = space.opt - space.min
4  box  width = 0
5  penalty  width = 0, value = infinity
6  glue with = border/padding at the beginning of the line = B

so:
- element 1 is a legal break point, but it is never chosen as 2 is better
- element 2 is a legal break point: if it is chosen, the ending line will
  reserve a width of A for border and padding, and the next line will
  reserve a width of B (the glue 3 is discarded)
- element 3 is NOT a legal break because of the preceding penalty
- element 5 is NOT a legal break because of its value
- element 6 is NOT a legal break because of the preceding penalty
- is there is no break, the overall width is A + (space.opt - (A + B)) + B
  = space.opt

In order to make all this work, the TextLM should
- know that it is working on text with non-conditional borders
- combine this sequence with the one it would create in a normal
  situation

Regards
Luca




Re: SVG Image cropping/positioning

2005-09-05 Thread Luca Furini

Richard W. wrote:


I'm starting now. I've had to rename inline_block_nested_\#36248.xml
to inline_block_nested_bug36248.xml to get the junit task to build.


I had to rename that file too; I have win xp.

Regards
Luca




Re: [Xmlgraphics-fop Wiki] Update of ExtensionPoints by JeremiasMaerki

2005-09-02 Thread Luca Furini
Speaking of extensions, I'd like to resurrect the layout extensions that 
were part of the code used to start the Knuth branch, but I want to be 
sure I'm allowed to do it.


The set of extensions (a couple of new properties, and some new value for 
an existing one) is aimed to give the user more control about the page 
breaking: in particular, via these extensions it is possible to give the 
application a list of properties that can be adjusted in order to fill all 
the available bpd of a region (in addition / substitution to the spaces 
between blocks [1]).


I started writing a wiki page about these extensions on the wiki at 
http://wiki.apache.org/xmlgraphics-fop/LayoutExtensions (I really should 
take some time to finish it!).


My highest-priority, short-term task is still to fix the behaviour of 
page-number and page-number-citation, as I think these formatting object 
must work in the next release: I have almost done, just have to finish 
handling the case of justified ext. After that, obviously if there are no 
objections against this, I'd like to spend some time on the extensions, 
that I'm sure could come in handy for fop-users producing book-style (or 
report-style) documents.


For example, here is a link to a message in the xsl-editors mailing list 
requesting a feature which is completely equivalent to one of the layout 
extensions: 
http://lists.w3.org/Archives/Public/xsl-editors/2005JulSep/0007.html (many 
thanks to Jeremias for pointing it out to me!). Should I be allowed to 
keep working on this subject, I could answer him that fop will soon be 
able to cope with his request.


Regards
Luca

[1] ... which makes me think that I should work on space resolution rules 
too ... my to-do list keeps growing longer and longer! :-(





Re: FOP Visuals

2005-09-02 Thread Luca Furini

Jeremias Maerki wrote:


For those who don't want to run BatchDiffer themselves, I've uploaded a
ZIP full of PNGs, one per layout engine test case combined from output
from the PDF, PS and Java2D renderers.


Just an idea ... what about an option to have the output from two 
renderers and the XOR between the two? It could help noticing small 
differences, in the order of a few points, that could otherwise pass 
unnoticed.


Regards
Luca


Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch

2005-08-31 Thread Luca Furini

Chris Bowditch wrote:


+  * Conditional space support, i.e. space-before.conditionality=retain


Chris, doesn't this work already?

As far as I can remember the correct space resolution is still missing, so 
for example the space-after of a block is not added to the space-after 
of the following block (they are just appended, and this has some side 
effects on keeps), but the conditionality should be handled correctly.


I have just tested the simplest example possible (just a block with text 
and a space-before with conditionality = retain) and it seems ok.


Regards
Luca




Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch

2005-08-31 Thread Luca Furini

Chris Bowditch wrote:

I just knocked up a small test case and although retain is honoured, 
discard is ignored. I knew it wasn't quite yet working but didn't 
realise retain was working :) I'll update the Wiki.


Could you please also attach your file?

I have tested a simple sequence of blocks with conditional spaces and the 
output seems ok; the output of the testcase space-block2.xml seems correct 
too (I'm going to add checks).


Maybe I forgot to fix some LM.

Regards
Luca




Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch

2005-08-31 Thread Luca Furini

Chris Bowditch wrote:


Here is the sample:


Thanks!

I have tested a simple sequence of blocks with conditional spaces and 
the output seems ok; the output of the testcase space-block2.xml seems 
correct too (I'm going to add checks).


Not true, space-block2.xml does not work. On the second page, there 
should not be any space between the two paragraphs.


I'm no more sure I follow you ... :-)

In your example the second block has a conditional space before, but it is 
not the first son of a reference area (not the first in the page) so I 
would expect it not to be suppressed. Should all conditional spaces be 
always suppressed, regardless of their position, what whould be the point 
in using them? :-)


As per the testcase spaces-block2, I similarly think there should be a 
space between the first and second block on page 2; anyway, in this case 
the actual behaviour is probably wrong as the space resolution rules (if I 
understand them correctly) seems to imply that it should be only 10 
points.


Regards
Luca




Re: page-number and page-number-citation problem

2005-08-30 Thread Luca Furini

J.Pietschmann wrote:

Maybe I'm wrong in trying to do so, but I'd like to handle both 
formatting objects in the same way.


If page numbers can be resolved to strings early, it should be
done. All the hassle for space readjusting, and perhaps reflowing
content, should be reserved for forward references, if only for
performance reasons.


Sorry, my last message was not very clear (and / or I misunderstood your 
comments).


The point is that the real page numbers are not known until the 
addAreas() phase, when pages are actually created.


The Knuth-style page breaking algorithm gets a representation of a whole 
page-sequence (or part of it, if there are break conditions) and then 
computes all the page breaks at once: so, the fo:page-numbers comprised in 
that page-sequence cannot know in which page they will be placed, and the 
line breaking is necessarily performed using elements whose width could be 
just a guess.


What I meant when I said that both page-number and page-number-citation 
should be handled in the same way was this: during the line breaking their 
real value is equally unknown.


Well, to be more precise the value of a page-number is *always* unknown 
during line breaking, while a page-number-citation could refer to an 
object in a previous page-sequence, so it could be known: in this case the 
method PNCLM.get() already returns a TextArea with the real value and 
its ipd (maybe you were referring to this? this won't be changed at all).


[from the other message]
- sometimes, when a particularly elegant output is needed, it would 
really be desirable to have a two-steps algorithm, with line-breaking 
performed again once the actual width of each object is known.


Well, it's not for particular elegant output, it's for the
case of having multiple page number citations which point
to five digit page numbers in the same line. Real life examples
include references to page numbers in roman number format, which
easily get into the six character range, and enumerating
references in book indices, where the problem is may be amplified
as an index is usually set in several narrow columns.


Great examples, I did not think of them!

I imagine that, should the index be in a page-sequence preceding the ones 
with the content, the line breaking of it could be really ugly, due to the 
provisional width of the references.


This example is really interesting: in this case, a re-flowing of the 
index pages could not be able to achieve a better output, should it be 
performed before the breaking of the page-sequence with the content; and 
it could be avoided just deferring the breaking of this page-sequence, so 
that the first breaking can already work using the real values for all 
page-number-citations.


If we see each page-sequence as a node, and a page-number-citation as a 
directed edge from one node (the target page-sequence) to another one (the 
page-sequence containing the page-number-citation), this is a well-known 
problem: the topological sorting of a graph.
If the graph is acyclic then there is a sorting of its nodes such that for 
each edge going from a node A to a node B, A precedes B in the sorting 
order; i.e., the page-sequences could be ordered so that each one is 
flowed when all its page-number-references are already known.


Very interesting indeed ... as soon as I finish working on the 
line-adjusting I'll spend some more thought on this ...


(sorry for the long message!)

Regards
Luca




Re: page-number and page-number-citation problem

2005-08-29 Thread Luca Furini

J.Pietschmann wrote:

In the maintenance branch, the formatted page number string was produced 
just as a new page was set up. I wonder whether the page sequence LM can 
put the current page number string into the layout context?


This could work for page-numbers but not for page-number-citations, as 
they could refer to an object in a different (and not yet paginated) 
page-sequence.


Maybe I'm wrong in trying to do so, but I'd like to handle both formatting 
objects in the same way.


Regards
Luca



Re: page-number and page-number-citation problem

2005-08-29 Thread Luca Furini

Firstly, thank you all for your suggestions.

All your interesting replies led me to this conclusion:

- in most cases, it is enough to make some local adjustments in each line 
containing page-numbers or page-number-citations;


- sometimes, when a particularly elegant output is needed, it would really 
be desirable to have a two-steps algorithm, with line-breaking performed 
again once the actual width of each object is known.


So, I'll start implementing the general purpose solution, storing the 
needed information inside an object (rather than directly as new 
attributes of areas) so as to reduce memory usage.


Regards
Luca




page-number and page-number-citation problem

2005-08-26 Thread Luca Furini
There is a layout problem with fo:page-number and fo:page-number-citation, 
already pointed out but still unresolved.


I think, these formatting objects are very similar, even if their actual 
handling is quite different: they both must be replaced by an information 
(a page number) that is (or could be) not available during the line 
breaking, so that a provisional width is used instead of the real one 
during the creation of the elements.


The method PageNumberCitationLM.get() allocates the width of the string 
MMM if the id is not already known; PageNumberLM.get() calls 
getCurrentPV().getPageNumberString(), but, as pagination is performed 
later, it always get the page-sequence initial page number (I am going to 
add a testcase showing a situation in which this makes some text overlap).


The real number could be known as soon as the pagination for the current 
page-sequence is done (for a fo:page-number) or even later (if there is a 
fo:page-number-citation whose referenced object is in a page-sequence 
following the current one). In both cases, if there is a differnce between 
the allocated width and the real one, indents and / or adjustment ratios 
should be re-computed.


The computation, in itself, is easy, as the LineLM already has all the 
necessary information: line width, unadjusted width, available stretch and 
shrink.


The point is that this information is stored in the LineBreakPositions, 
while the actual value (and the actual width) is set directly into the 
area tree.


In order to adjust the inline content of a line when the page number is 
resolved, I see two alternative strategies:


1) the LineLM has to handle this: this needs the LineAreas to hold a 
reference to the LineLM that creates them, and that knows all the needed 
information;


2) the LineArea has to handle this: this means that the LineArea (and the 
InlineAreas too) must be given the information about MinOptMax ipd and

provisional adjust ratio

I don't like 1 very much, because I think the creator LM is not a 
significant attribute of an area, but 2 involves adding many attributes 
too (and maybe even less significant!) ...


What do you think? Do someone see a different strategy?

Regards
Luca


  1   2   >