DO NOT REPLY [Bug 37329] New: - More readable error messages

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329

   Summary: More readable error messages
   Product: Fop
   Version: 1.0dev
  Platform: PC
OS/Version: Windows 2000
Status: NEW
  Severity: normal
  Priority: P2
 Component: fo tree
AssignedTo: fop-dev@xmlgraphics.apache.org
ReportedBy: [EMAIL PROTECTED]


I downloaded the latest FOP trunk of 1.0, build the fop.jar with the ANT script 
supplied and tested generation PDF files from XML/XSL sources (in comparison to 
FOP 0.20.5). After fixing some simpler error messages (like fo:region-before 
must be declared before fo:region-after), none of your documents run through 
because of this error

javax.xml.transform.TransformerException: 
org.apache.fop.fo.ValidationException: Error(Unknown location): fo:table-cell 
is missing child elements. 
Required Content Model: marker* (%block;)+

Well, I don't really understand what this error means. It would also be helpful 
to have the name of the source file included. As each of our XML/XSL documents 
reference further XML/XML documents (xsl:include ...), and all have many 
tables, it would be helpful to have the filename and the line number. Is this 
possible?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37318] - fop.bat: NoClassDefFoundError: org/apache/fop/cli/Main

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37318.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37318





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 11:45 ---
Okay, is there a forum for this? Mail lists are evel. Just for having a 
question to subscribe to such a list and receive any amount of emails every day 
that don't interest me. A forum is much more convenient. Is there any?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37329] - More readable error messages

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||INVALID




--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 12:03 ---
xsl:include has nothing to do with XSL-FO or FOP, it is specific to XSLT. 
Generating a Printed Document (PDF or PS) from XML is typically a two step 
process: XML + XSL - FO - PDF

FOP is responsible for generating a PDF from FO. For the user's convenience 
the interface to FOP will take XML + XSL files which FOP passes off to a XSLT 
processor, e.g. Xalan, which generates the FO file, and passes it via SAX to 
FOP which generates the PDF.

So in order to track down your problem when you have multiple XSL files you 
will need to first run the XSL Transform part of this process separately using 
Xalan or similar to generate a large FO file which will have all your 
xsl:includes pulled in. It should then be easier to find the problem by 
looking at the FO file.

A polite request: problems/questions should be directed to the fop-user 
mailing list before they are raised as bugs. Thanks

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37330] - [PATCH] FOP Bridges not properly registered with Batik

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37330.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37330





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 12:11 ---
Created an attachment (id=16851)
 -- (http://issues.apache.org/bugzilla/attachment.cgi?id=16851action=view)
Fix bridge registration

This patch fixes the bridge registration by 
creating a subclass of BridgeContext and
overriding registerBridges to replace the
standard SVG bridges after the baseclass
has done it's registration.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37329] - More readable error messages

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 12:16 ---
Have you seen http://xmlgraphics.apache.org/fop/trunk/upgrading.html ? The 
upgrade issue you describe is explained in a fairly prominent position on that 
page.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37318] - fop.bat: NoClassDefFoundError: org/apache/fop/cli/Main

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37318.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37318





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 12:19 ---
No there is no Apache hosted forum for FOP. The fop-user list is there for that 
purpose. It is not a very high volume list as a quick check of the archives 
would reveal. And you can always unsubscribe once your issues are resolved.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37236] - [PATCH] Fix gradients and patterns

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37236.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37236


[EMAIL PROTECTED] changed:

   What|Removed |Added

  Attachment #16837|0   |1
is obsolete||




--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 12:56 ---
Created an attachment (id=16853)
 -- (http://issues.apache.org/bugzilla/attachment.cgi?id=16853action=view)
Update to gradient repeat, fixed createGraphics problem.

This patch is an update to 16837.  It indirects the
access to jpegCount so all the PDFGraphics2D share
a common count.  This prevents inadvertant reuse of
the wrong image.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37329] - More readable error messages

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 13:20 ---
If you mean this:

While FOP 0.20.5 allowed you to have empty fo:table-cell elements, the new code 
will complain about that (unless relaxed validation is enabled) because the 
specification demands at least one block-level element ((%block;)+, see XSL-FO 
1.0, 6.7.10) inside an fo:table-cell element. 

It still appears after adding -r in the fop.bat call (relaxed validation).

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Luca Furini

Manuel Mall wrote:

So we end up with only two cases to consider: preserve white space and 
remove white space around a line break created by the Knuth algorithm.


1. Preserve white space: IMO in this case the space itself is actually 
not a break opportunity but there are now two break opportunities: one 
before the space and one after the space. That is a sequence like 
'abc#x20;def' is more like 'abc#x200b;#xa0;#x200b;def' or in a more 
readable notation 'abczwspnbspzwspdef'. That is our normal space 
becomes a non-breakable space flanked by zero-width spaces which 
represent the break opportunities. If this is correct the Knuth 
elements would look like:

 glue w=0
 box w=0
 pen +INFINITE
 glue w=space
 pen
 glue w=0
Is this sequence correct? The first and last glue represent the zwsp 
and are break opportunities. The box prevents the removal of the space 
if a break is created before the space. The penalty prevents the space 
to be considered as a break opportunity.
Of course as usual these sequences are further complicated in the 
absence of justification and in the presence of border/padding.


I like your idea of expanding a preserved space into zwsps and nbsp; 
this allows us to forget alignments and borders / padding as we just have 
to insert the appropriate elements for the non breaking space.


The sequence is very good, as it has a couple of interesting properties:

- it interacts with the surrounding elements just a single glue element

- if there are two (or more) consecutive, non-collapsed spaces the 
sequence has just 3 feasible breaks, not 4


However, I have a doubt: reading the Unicode document about line breaking, 
it seems to me that, regardless of the quantity of consecutive spaces, 
there is only *one* feasible break, after the last one (Unicode Standard 
Annex #14, section 2 Definitions, in particular the definition of 
direct break and indirect break)


--- begin quoted text ---

Direct Break - a line break opportunity exists between two adjacent 
characters of the given line breaking classes. This is indicated in the 
rules below as B ? A, where B is the character class of the character 
before and A is the character class of the character after the break. If 
they are separated by one or more space characters, a break opportunity 
also exists after the last space. In the pair table, the optional space 
characters are not shown.


Indirect Break - a line break opportunity exists between two characters of 
the given line breaking classes only if they are separated by one or more 
spaces. In this case, a break opportunity exists after the last space. No 
break opportunity exists if the characters are immediately adjacent. This 
is indicated in the pair table below as B % A, where B is the character 
class of the character before and A is the character class of the 
character after the break. Even though space characters are not shown in 
the pair table, an indirect break can only occur if one or more spaces 
follow B. In the notation of the rules in Section 6, Line Breaking 
Algorithm this would be represented as two rules: B ? A and B SP+ ? A.


--- end quoted text ---

I still have not read the document from top to bottom, and I could have 
misunderstood even the sections I read :-), but I think this point must be 
clarified before we continue.


Regards
Luca



DO NOT REPLY [Bug 37329] - More readable error messages

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329


[EMAIL PROTECTED] changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Status|RESOLVED|REOPENED
 Resolution|INVALID |




--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 13:30 ---
A polite request: problems/questions should be directed to the fop-user 
mailing list before they are raised as bugs. Thanks

This is not a question, but a request for enhancement.  The error message I 
posted above is an example. And it is a error message from FOP (as a FOP user, 
it doesn't matter if it uses Xalan or similar internally), so I ask if it would 
be possible to give some more information in the error message to identify 
which source files are concerend.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Manuel Mall
On Wed, 2 Nov 2005 01:59 pm, Manuel Mall wrote:
 On Wed, 2 Nov 2005 04:18 am, Simon Pepping wrote:
  On Tue, Nov 01, 2005 at 11:40:42PM +0800, Manuel Mall wrote:
   This is probably a question for Luca or Simon.

 snip/

  Glue and penalty items are removed at the start of a line. This is
  part of the Knuth algorithm. It does not touch the matter of
  white-space-collapse. If there is whitespace that may not be
  removed/collapsed at the start of the line, it must be protected by
  a preceding zero-width box. I.o.w., the value of
  white-space-collapse needs to be taken into account at the phase of
  getNextKnuthElements.

 Fair enough - I need some help with the Knuth elements then.

 During getNextKnuth we need to only consider white-space-treatment as
 white-space-collapse can be handled completely during refinement,
 that is consecutive sequences of white space are either collapsed or
 not during refinement.

 We also can limit white-space-treatment during getNextKnuth to any
 line breaks generated by the line breaking algorithm (Knuth
 algorithm). white-space-treatment around hard line breaks (linefeeds,
 start/end of a block) are handled during refinement.

 We can also limit white-space-treatment during getNextKnuth to the
 values preserve vs ignore-if Other values are handled during
 refinement. We also can treat the three different ignore-if...
 values, that is the values: ignore-if-before-linefeed,
 ignore-if-after-linefeed, ignore-if-surrounding-linefeed, as just one
 case: 'delete all white space around a formatter generated break'.

 So we end up with only two cases to consider: preserve white space
 and remove white space around a line break created by the Knuth
 algorithm.

 1. Preserve white space: IMO in this case the space itself is
 actually not a break opportunity but there are now two break
 opportunities: one before the space and one after the space. That is
 a sequence like 'abc#x20;def' is more like
 'abc#x200b;#xa0;#x200b;def' or in a more readable notation
 'abczwspnbspzwspdef'. That is our normal space becomes a
 non-breakable space flanked by zero-width spaces which represent the
 break opportunities. If this is correct the Knuth elements would look
 like:
 glue w=0
 box w=0
 pen +INFINITE
 glue w=space
 pen
 glue w=0
 Is this sequence correct? The first and last glue represent the
 zwsp and are break opportunities. The box prevents the removal of
 the space if a break is created before the space. The penalty
 prevents the space to be considered as a break opportunity.
 Of course as usual these sequences are further complicated in the
 absence of justification and in the presence of border/padding.

 2. Removal of white space: This is the current behaviour but it works
 only for a single space and not for a sequence of spaces. Actually
 because the algorithm removes leading glues/penalties it is mainly a
 problem for trailing white space. I am not sure how to best tackle
 this. What comes to mind is:

 a) Do the same as for leading glues/penalties at the end of the line.
 However I am not sure how tricky it would be to determine the
 boundary because any 'blocking boxes' (see 1. above) are only placed
 before but not after elements. This options suffers from the problem
 that it will not remove leading/trailing white space across inline
 boundaries with border/padding as these generate zero width boxes to
 block removal of the glue elements for the border/padding.

 b) Do not generate individual Knuth sequences for each white space
 character but instead collect all consecutive white space and create
 one glue-penalty sequence for it. Again I am uncertain of the
 consequences of doing that. To do that correctly we would need to
 collect white space across inline boundaries. This firstly breaks the
 current getNextKnuth approach which assumes each LM can generate its
 sequences without knowledge of its neighbours. It would also break
 the current area info structures as a single Knuth element could now
 refer to text snippets from different LMs.

 Comments please.

  Simon

 Thanks

Luca wrote a longer response to this but my mail reader doesn't like the 
character set (is that topical or what?). Any way at end end Luca ask 
the question about the UAX#14 line breaking algorithm and its handling 
of spaces. My answer to that is:
a) Yes UAX#14 always breaks at the of a sequence of spaces
b) But is also says that it assumes any trailing spaces in a line are 
being removed
This conflicts with XSL-FO which can force spaces being retained 
therefore adjustments to the algorithm are necessary to cater for that. 
One possible adjustment is simply changing what is given to the 
algorithm as indicated above, ie sp becomes zwspnbspzwsp.

Manuel

 Manuel

In case other people have the same problem with Luca's post here is the 
content:
 Start Luca's e-mail +
I like your idea of expanding a preserved space into zwsps and nbsp;
this allows us to forget alignments and 

DO NOT REPLY [Bug 37136] - external-graphic dimensions and rendering

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37136.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37136





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 14:45 ---
I noticed that we also have svg images imported and that code is causing this 
(same) error message:


fo:instream-foreign-object 
svg:svg width=120mm height=130mm 
xmlns:xlink=http://www.w3.org/2000/svg;
svg:image width=120mm height=130mm 
xlink:href={$chart_C_AssetSectorStructure2_H}/
/svg:svg
/fo:instream-foreign-object


leads to:

javax.xml.transform.TransformerException: java.lang.RuntimeException: Some 
content could not fit into a line/page after 50 attempts. Giving up to avoid an 
endless loop.

I checked it when I removed just this section of code, the message disappeard 
(and also the SVG ;-). Repasting it back reproduces the error message. 
Unfortunately,   content-width=scale-to-fitdoesn't work with svg:svg 
and svg:image.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37136] - external-graphic dimensions and rendering

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37136.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37136





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 15:11 ---
Please see response #4 for the SVG example.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37136] - external-graphic dimensions and rendering

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37136.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37136





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 15:15 ---
Comment #4 is not a working example showing the problem. It is just a snippet 
from your XML file. For us to investigate this we need the XSL-FO file (as 
small as possible just enough to show the problem) together with any related 
files (svg, jpeg, ...)

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Luca Furini

Manuel Mall wrote:

Luca wrote a longer response to this but my mail reader doesn't like the 
character set (is that topical or what?).


Sorry, it looks really horrible ... still don't know what went wrong, but 
I won't do it again! :-)


Any way at end Luca ask the question about the UAX#14 line breaking 
algorithm and its handling of spaces. My answer to that is:

a) Yes UAX#14 always breaks at the of a sequence of spaces
b) But is also says that it assumes any trailing spaces in a line are 
being removed
This conflicts with XSL-FO which can force spaces being retained 
therefore adjustments to the algorithm are necessary to cater for that. 
One possible adjustment is simply changing what is given to the 
algorithm as indicated above, ie sp becomes zwspnbspzwsp.


Ok, so back to your previous message:


2. Removal of white space: This is the current behaviour but it works
only for a single space and not for a sequence of spaces. Actually
because the algorithm removes leading glues/penalties it is mainly a
problem for trailing white space. I am not sure how to best tackle
this. What comes to mind is:

a) Do the same as for leading glues/penalties at the end of the line.
However I am not sure how tricky it would be to determine the boundary
because any 'blocking boxes' (see 1. above) are only placed
before but
not after elements. This options suffers from the problem that it will
not remove leading/trailing white space across inline boundaries with
border/padding as these generate zero width boxes to block removal of
the glue elements for the border/padding.



b) Do not generate individual Knuth sequences for each white space
character but instead collect all consecutive white space and create
one glue-penalty sequence for it. Again I am uncertain of the
consequences of doing that. To do that correctly we would need to
collect white space across inline boundaries. This firstly breaks the
current getNextKnuth approach which assumes each LM can generate its
sequences without knowledge of its neighbours. It would also break the
current area info structures as a single Knuth element could now refer
to text snippets from different LMs.


I'm not sure I follow you in all the details of white space handling and 
here we have borders too ... :-)


I like b) most: after all, this is somewhat similar to the space 
resolution, as we have interactions between spaces coming from different 
nodes, and it's difficult to have each LM decide on its own. And I think 
we could find a way to keep the 1-1 relationship between AreaInfo objects 
and Positions.


I have tried to play with the elements, and here are a few results: I hope 
they can help!


At the moments, the sequence for a single space with borders and padding 
is:


1  glue w=endBP
2  penalty w=0
3  glue w=(spaceIPD - endBP - startBP)
4  box w=0
5  infinite penalty
6  glue w=startBP

total width = spaceIPD
if break at #2 = endBP / startBP

If we have two (or more) spaces, we could use the sequence:

1  glue w=endBP
2  penalty w=0
3  glue w=(- endBP - startBP)
4  glue w=spaceIPD1
5  glue w=spaceIPD2
6  box w=0
7  infinite penalty
8  glue w=startBP

total width = spaceIPD1 + spaceIPD2
if break at #2 = endBP / startBP

Glues #4 and #5 have a Position pointing to different AreaInfo objects 
(from different LMs). This should solve (?) the case of 
ignore-if-surrounding.


If white-space-treatment is ignore-if-after, and we have two consecutive 
spaces we could use the sequence:


1  glue w=endBP
2  penalty w=0
3  glue w=(spaceIPD - endBP)
4  penalty w=0
5  glue w=(spaceIPD - startBP)
6  box w=0
7  infinite penalty
8  glue w=startBP

total width = 2 * spaceIPD
if break at #2 = endBP / startBP
if break at #4 = endBP + spaceIPD / startBP

With three or more consecutive spaces:
1  glue w=endBP
2  penalty w=0
3  glue w=(spaceIPD - endBP)
4  penalty w=0
5  glue w=spaceIPD
6  penalty w=0
7  glue w=(spaceIPD - startBP)
8  box w=0
9  infinite penalty
10 glue w=startBP

total width = 3 * spaceIPD
if break at #2 = endBP / startBP
if break at #4 = endBP + spaceIPD / startBP
if break at #6 = endBP + 2 * spaceIPD / startBP

I did not find a sequence for ignore-if-before yet ...

Regards
   Luca


Re: linefeed-treatment=preserve

2005-11-02 Thread Andreas L Delmelle

On Nov 2, 2005, at 01:39, Manuel Mall wrote:


On Wed, 2 Nov 2005 05:14 am, Simon Pepping wrote:


On Tue, Nov 01, 2005 at 06:54:08PM +0100, Andreas L Delmelle wrote:


On Nov 1, 2005, at 16:03, Manuel Mall wrote:


Is it a)
empty line
line 1
empty line



This one, IMO.



I agree, a)



Same as Simon, Joerg and Andreas I thought a) as well but am now
confused because both AntennaHouse and RenderX render:


FWIW: I'm sticking to my initial a).

Furthermore, I tried following the current whitespace removal  
algorithm (fo.flow.Block.handleWhiteSpace()) for this particular  
case, and IIC, it is handled pretty well --IOW: both linefeeds are  
properly presented to the layout-engine, but it's somewhere in layout  
that the decision is made to drop the trailing linefeed. Probably  
because the feasible break after 'line1' gets 'merged' with the  
forced line-break (or: what exactly does the algorithm do when it  
encounters a mere break-possibility immediately followed by a forced  
line-break?)



Cheers,

Andreas


DO NOT REPLY [Bug 37329] - Relaxed validation NYI for fo:table-cells

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|REOPENED|ASSIGNED
Summary|More readable error messages|Relaxed validation NYI for
   ||fo:table-cells




--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 18:24 ---
For the record:
1. I agree with Chris that FOP can't be expected to tell you where precisely in 
which source document 
the error lies (FOP is ultimately presented with the result of the XSL 
transform, so all the references to 
whatever source files you're using is lost at that point) FOP could add 
location info to the error 
message, but this makes little sense if the intermediate FO doesn't exist as a 
physical file.

2. The error message is, IMO, quite self-explanatory (at least for anyone who 
has had so much as a 
glance at the XSL-FO Recommendation). It simply means that an fo:table-cell 
should have 'zero-or-
more markers' plus 'one-or-more block-level' descendants (according to the 
XSL-FO Rec.)

3. The relaxed validation doesn't work yet for this particular case... Either 
this is an oversight in the 
docs, or this is still TODO. I'll see if I can commit a fix for this ASAP.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37329] - Relaxed validation NYI for fo:table-cells

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED




--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 19:24 ---
OK it's fixed. 
Please do note that this will only lead to usable results if the cells are 
really supposed to be empty. Any 
text-content of the cell will be silently ignored/dropped.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37329] - Relaxed validation NYI for fo:table-cells

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37329.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37329





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 19:34 ---
On second thought: dropping it silently seemed outright stupid, so added a 
little warning message

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


DO NOT REPLY [Bug 37136] - external-graphic dimensions and rendering

2005-11-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37136.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37136





--- Additional Comments From [EMAIL PROTECTED]  2005-11-02 19:40 ---
In reply to comment #4:

See recent post on fop-users: specify content-height and content-width on the 
fo:instream-foreign-
object, not on the svg element.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Current FOText implementation + Refinement whitespace handling

2005-11-02 Thread Andreas L Delmelle

Hi all,
(Manuel, I guess this is mostly directed to you, as you may already  
have been browsing the same classes...)


Just wandering a bit through the FOText source code (follow-up on  
Manuel's recent thread on whitespace handling), and I stumbled upon  
the following suspicious little detail:
FOText has a static member 'lastFOTextProcessed', which doesn't seem  
to get cleared/flushed anywhere.


The intention is quite clear, but the possible effects of the current  
implementation may turn out rather nasty. IIC, this is what the  
warning is about in the FOText javadoc as well as the TODO for that  
member variable.
Rough guess: since the variable doesn't get cleared, it always  
contains a reference to a char array containing the last portion of  
accumulated text (or, more precisely, a FOText instance carrying that  
reference, as well as one to the previous FOText etc.) --even after  
the document has finished, into the next run if within the same JVM  
(+ possible multi-thread mayhem?)
The TODO hints at a solution involving the page-sequence. I somehow  
feel that moving it to the block level would be enough... Logically,  
whitespace handling --which is one of the prime reasons of existence  
of this static variable-- deals with line-breaks, and start-block/end- 
block are implicit after- or before-eol.


To follow up on that last sentence, the current refinement whitespace  
handling works roughly as follows:


1. Add all text and inline children to the block, until the first non- 
inline child is encountered (or the block ends)
2. Recursively iterate over *all* text nodes anywhere in the block up  
to here, converting/removing any superfluous whitespace in the process


and (+/-) repeat the above for each uninterrupted sequence of text/ 
inline children in the block.


Seems to work nicely, for the most part.

Manuel already raised the issue of inappropriate inter-FO whitespace- 
collapsing, but I have another question. Given this algorithm, and  
knowing that the inlines do not do any whitespace-handling  
themselves, what happens in the following case:


fo:block
  fo:inline
fo:block
  fo:inline
fo:block
...
?

My current best guess is that the inner block's underlying character  
sequence will be 'recursively' iterated over three times (?) That  
would be two too many, since all whitespace will have been collapsed  
the first time around.


I'm still chewing on some ideas to move part of this to InlineLevel,  
so that ultimately, we can do away with the recursion and let each  
level handle its own small part. The higher level then chains these  
small parts together with its own character content.


One way to make this happen would be to overload  
Block.handleWhiteSpace() to deal with an InlineLevel parameter. This  
has the advantage of the whitespace-related properties being easily  
available. The call to this overloaded method would be made from  
InlineLevel.endOfNode().


If you're still following, I'd use a CharIterator that iterates over  
regular characters, fo:characters (and possibly the first and last  
characters of any nested FO). This iterator can operate very easily  
on both inlines and blocks. I don't immediately see any need to  
iterate backwards, at least not during refinement. Big advantage here  
would precisely be that we can wait until Block.endOfNode() to deal  
with any white-space for the entire block (leading and trailing), the  
nested bits will already have performed their parts at that point, so  
it is done sooner and far more efficiently IIC (guaranteed only one  
pass per level, no matter how deep the nesting goes).


Food for thought :-)

Cheers,

Andreas


Re: Unicode compliant Line Breaking

2005-11-02 Thread Simon Pepping
On Tue, Nov 01, 2005 at 11:17:08PM +0100, J.Pietschmann wrote:
 Simon Pepping wrote:
 Is our current hyphenation method a subset of Unicode's method?
 
 Umm. What's the relation between hyphenation and TR14 (except for
 handling soft hyphens)? I guess you confuse finding line breaks
 in general and line breaking due to hyphenation.

I mean, will our current method of finding possible line breaking
points using the hyphenation tables be part of a TR14 compliant system
to find line break opportunities?

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Simon Pepping
On Wed, Nov 02, 2005 at 04:58:09PM +0100, Luca Furini wrote:
 Manuel Mall wrote:
 
 Luca wrote a longer response to this but my mail reader doesn't like the 
 character set (is that topical or what?).
 
 Sorry, it looks really horrible ... still don't know what went wrong, but 
 I won't do it again! :-)

It is in the quoted-printable format, probably due to non-ascii
or non-latin-1 characters in it, the TR14 symbols. 

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Unicode compliant Line Breaking

2005-11-02 Thread J.Pietschmann

Simon Pepping wrote:

I mean, will our current method of finding possible line breaking
points using the hyphenation tables be part of a TR14 compliant system
to find line break opportunities?


In some sense yes, but I'm not sure what you really mean.

Currently, spaces and slashes (/) as well as hyphenation points
are considered break opportunities. TR14 doesn't care about hyphenation
but expands significantly on the other points. For example, in the
string foo-bar the position after the dash is a break opportunity,
as people usually expect, but in -1234 the position after the dash
isn't a break opportunity, also as people usually expect. The TR
encodes as much of such expectations as is possible with a limited
context.

A few places in TextLayoutManager which use BREAK_CHARS will have to
be changed, either keeping info from a previous scanning using a
BreakIterator or something, or looking up the line break Unicode
properties and looking up whether a break may occur in the
line-break matrix. Hyphenation points are generated elsewhere and
remain unaffected.

J.Pietschmann


Re: zero width space

2005-11-02 Thread J.Pietschmann

Manuel Mall wrote:
That seems to be the consensus, that is consider ZWS for line breaking 
but then discard and don't give it to the renderers.



Renderers could deal with ZWS if the font would have a glyph for
this character; unfortunately, that's not the case for the PDF
standard fonts  :-)  Some fonts *do* have glyphs for various Unicode
space characters, notably the fixed width spaces.

This leads to the question: Is a space a character? What *is* a
character? The Unicode people had endless discussions about this.
Spaces are exactly in the gray area between real characters
which leave marks and layout control.

Handling space characters in layout and discarding them before
rendering has the distinctive advantage that they work for
any font in any renderer (which can handle variable space areas
properly, of course). OTOH, renderers which output a format which
can handle the spaces itself, like a hypothetical HTML renderer,
would better get the original character.

Are there any other (unusual Unicode) characters which fall in the same 
category that is they influence layout decisions but should not be seen 
by the renderers?


* Unicode spaces
 + variable with spaces
   - ordinary space U+0020
   - ordinary non-breaking space U+00A0
 + fixed width spaces; potentially available in fonts and *may*
   be passed to renderers, *except* for U+200B
   - zero width space U+200B, may expand in justification (not
 implemented this way in FOP 0.20.5, which will haunt us)
   - zero width non breaking space, aka byte order mark U+FEFF,
 should now only be used as BOM (as the BOM is eaten by the
 XML parser, FOP could emit a deprecated warning)
   - en quad U+2000, according to my Unicode book *identical* to
 U+2002, *not* a 4en space (strange)
   - em quad U+2001, similar to U+2000
   - en space aka nut U+2002,
   - em space aka mutton U+2003
   - three-per-em space aka thick space (1/3 em width) U+2004
   - four-per-em space aka mid space (1/4 em width) U+2005
   - six-per-em space (generally 1/6 em width) U+2006
   - figure space (font dependent) U+2007
   - punctuation space (as wide as a dot or comma) U+2008
   - thin space (1/5..1/8 em width) U+2009
   - hair space (1/10..1/16 em width) U+200A
   - narrow no-break space (probably 1/6 em width) U+202F
   - mathematical space U+205F
   - non breaking word joiner U+2060 replaces U+FFEF in text
   - ideographic space U+3000
   - OGHAM SPACE MARK U+1680 (odd stuff)
   - Note: ETHIOPIC WORDSPACE U+1361 leaves marks and is therefore
 not a space. At least I hope so.
 + see also
http://en.wikipedia.org/wiki/Space_character
http://www.alistapart.com/stories/emen/

* Other characters
 + Character shaping hints; they do not cause line breaks.
   - zero width joiner U+200D
   - zero width non-joiner U+200C (may probably also hint at
 preventing ligatures)
   - see http://en.wikipedia.org/wiki/Zero-width_joiner et al.
 + Soft hyphen U+00AD. Must be hidden if no line break follows.
 + Formatting characters. I'd say these characters should not occur
   in XSLFO source, because there are FO which represent the same
   functionality.
   - line separator U+2028, FOP 0.20.5 creates an unconditional line
 break regardless of any FO properties
   - paragraph separator U+2029
   - bidi control characters 200E-200F, 202A-202E
   - deprecated controls 206A-206F


J.Pietschmann


Re: Leading/trailing space removal in LineLM

2005-11-02 Thread J.Pietschmann

Manuel Mall wrote:

a) Yes UAX#14 always breaks at the of a sequence of spaces
b) But is also says that it assumes any trailing spaces in a line are 
being removed
This conflicts with XSL-FO which can force spaces being retained 
therefore adjustments to the algorithm are necessary to cater for that. 


Computing line breaking opportunities and discarding whitespace at the
end (or beginning) of a line are different matters. If whitespace has
to be retained, trailing spaces after a non-space string may simply mean
the previous line breaking opportunity has to be used, because otherwise
the string including the trailing spaces will overflow the line area.
The trailing whitespace may also influence text justification.

J.Pietschmann



Re: Leading/trailing space removal in LineLM

2005-11-02 Thread Manuel Mall
On Thu, 3 Nov 2005 06:03 am, J.Pietschmann wrote:
 Manuel Mall wrote:
  a) Yes UAX#14 always breaks at the of a sequence of spaces
  b) But is also says that it assumes any trailing spaces in a line
  are being removed
  This conflicts with XSL-FO which can force spaces being retained
  therefore adjustments to the algorithm are necessary to cater for
  that.

 Computing line breaking opportunities and discarding whitespace at
 the end (or beginning) of a line are different matters. If whitespace
 has to be retained, trailing spaces after a non-space string may
 simply mean the previous line breaking opportunity has to be used,
 because otherwise the string including the trailing spaces will
 overflow the line area. The trailing whitespace may also influence
 text justification.

Hmm, to me it appears that UNICODE and XSL-FO have slightly different 
models when it comes to white space in the context of line breaking 
which is causing the discussion here. In UNICODE everything is based 
simply on the properties of the codepoint in question and its 
neighbour. In XSL-FO one can change the behaviour of a codepoint by 
setting those white space related XSL-FO properties. That is not a 
concept within UNICODE. If you want to retain white space in UNICODE 
you use a different codepoint. If you want to retain a space in XSL-FO 
you could use a different codepoint but more likely you set a XSL-FO 
property if you want this applied widely in your document.

If we want to 'marry' UNICODE linebreaking with XSL-FO white space 
handling we have this interaction to consider. One possible solution 
would be to replace spaces (U+0020) by different codepoints which 
resemble the behaviour modification imposed by any XSL-FO white space 
handling properties in effect. But I am not sure if this can be done in 
all cases. Otherwise we may have to modify the UNICODE line breaking 
algorithm to cater for the XSL-FO white space specialities.

 J.Pietschmann

Manuel