Re: White space handling Wiki page

2005-11-01 Thread Andreas L Delmelle

On Oct 31, 2005, at 22:18, Andreas L Delmelle wrote:


On Oct 27, 2005, at 06:29, Manuel Mall wrote:

Actually something like:
fo:block background-color=yellowword1fo:character
character=#10;/fo:character character=
 /word2fo:character character= /word3fo:character
character=#10;//fo:block
currently causes an exception!




The problem can be solved by a slight modification to OneCharIterator:
* add a constructor with Character parameter (and member)
* add a remove() implementation which makes Character's parent  
remove it from its list of child nodes


Tested locally (very quickly), and seems to work nicely. If I get  
the chance to commit it in the next few days, I'll do so myself,  
but if you want to have a go, it's a pretty easy fix (adds up to  
about 10-15 LOC incl. javadocs :-))


Oops, been too quick. From an UnsupportedOperationException to a  
ConcurrentModificationException...
The trick seems to be to introduce a small boolean 'discard' switch  
to the Character object, flip this upon calling OCIter.remove(), and  
have the Block/Inline later remove any of its characters marked as  
discardable, but do this (of course) only after the  
RecursiveCharIterator has finished --to avoid the childNodes list  
from being altered while it's being iterated over...


Other option: store a list of the discardable space fo:characters at  
Block or Inline level, instead of marking the Character itself as  
such...


A bit more than 15 LOC, but still quite doable.

Cheers,

Andreas



DO NOT REPLY [Bug 37318] - fop.bat: NoClassDefFoundError: org/apache/fop/cli/Main

2005-11-01 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37318.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37318





--- Additional Comments From [EMAIL PROTECTED]  2005-11-01 15:33 ---
Did you build fop using ant before trying to run it?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


zero width space

2005-11-01 Thread Manuel Mall
Currently if one puts a zero-width-space (U+200B) into an XSL-FO file 
(or specifies linefeed-treatment=treat-as-zero-width-space) it is 
rendered as a missing character in PDF. Is that correct, i.e. does 
this character have to exist in the font used or should the formatter 
or renderer simply remove this character? It is the second approach 
that both AntennaHouse and RenderX appear to have chosen.

Manuel


DO NOT REPLY [Bug 37318] - fop.bat: NoClassDefFoundError: org/apache/fop/cli/Main

2005-11-01 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=37318.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37318





--- Additional Comments From [EMAIL PROTECTED]  2005-11-01 17:29 ---
Build fob? Do I have to? There are alreayd JAR files in the lib/ directory ... 
but yes, I can't see a fop.jar file! Why isn't it offered as a jar file? I also 
can't find a source bundle zip file, so do I really need to download all source 
files? Then I would have to install subversion, because there are so much 
source files that it takes hours to download each of them via the subversion 
link. :-(

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Re: zero width space

2005-11-01 Thread Chris Bowditch

Manuel Mall wrote:

Currently if one puts a zero-width-space (U+200B) into an XSL-FO file 
(or specifies linefeed-treatment=treat-as-zero-width-space) it is 
rendered as a missing character in PDF. Is that correct, i.e. does 
this character have to exist in the font used or should the formatter 
or renderer simply remove this character? It is the second approach 
that both AntennaHouse and RenderX appear to have chosen.


I recommend that no character is output for a ZWS. The whole purpose of 
placing a ZWS into the input XSL-FO is to give layout an extra break 
opportunity, without changing the appearance of the generated document.


Chris




Re: zero width space

2005-11-01 Thread Andreas L Delmelle

On Nov 1, 2005, at 15:52, Manuel Mall wrote:


Currently if one puts a zero-width-space (U+200B) into an XSL-FO file
(or specifies linefeed-treatment=treat-as-zero-width-space) it is
rendered as a missing character in PDF. Is that correct, i.e. does
this character have to exist in the font used or should the formatter
or renderer simply remove this character? It is the second approach
that both AntennaHouse and RenderX appear to have chosen.


It's certainly not correct to render a missing glyph character, but  
it would also be wrong to remove it too early. The character doesn't  
take part in white-space treatment/collapsing, since it's not XML  
whitespace. It's somewhere in layout that the decision has to be made  
not to allocate space for this character, but it could play a part in  
line-building...


My two cents.

Cheers,

Andreas



Re: White space handling Wiki page

2005-11-01 Thread Andreas L Delmelle

On Nov 1, 2005, at 10:04, Manuel Mall wrote:



I am sure it is doable - but is it worth it at this stage? Possibly
after a better understanding of the white-space handling issues that
whole current system needs revision? One problem with the current char
iterator is that it iterates over inline boundaries which causes white
space to be collapsed across those which according to the  
clarification

of the WG is incorrect. IMO to implement the refinement step of the
white space handling (which currently happens in the flow.Block  
object)

we need an iterator which goes through all characters but indicates fo
boundaries (not including fo:characters) so we can do:
a) linefeed treatment across all characters;
b) white space collapse across each consecutive section of
implicit/explicit fo:characters, i.e. delimited by the start/end of
fo's;
c 1) white-space-treatment from the start of the fo:block to the first
non white-space character;
The iterator must also be able to either operate backwards or be  
able to

be reset to a particular position (last non white space character) so
we can do:
c 2)  white-space-treatment from the end of the fo:block backwards to
the first non white-space character

It must also support character deletions and character substitutions.

Does that make sense?


Very much. Precisely with that in mind, I've also been contemplating  
moving part of the whitespace-handling to inline-level. This would  
keep the nested inlines separated from the Block's own direct FOText  
descendants (and at the same time, in combination with the  
modification I already described, this would provide us with an  
opportunity to remove fo:characters from within the nested inlines -- 
which would become quite a pain if this removal is deferred to block- 
level)


So the RecursiveCharIterator should only create Iterators over  
regular FOText or fo:characters that are direct descendants of the  
Block/Inline. FOText of nested FObjs should be left alone, since the  
whitespace will already be collapsed. IOW, it should stop being -- 
recursive?


Currently, whitespace handling is triggered from the moment a Block  
encounters a child node that isn't FOText nor generates inline areas.  
At the basis this seems OK, the only difference I'd propose is that  
inlines do their own whitespace handling, so that *if* whitespace  
needs to be collapsed across fo boundaries --maybe there are  
cases?--, the block-level only needs to look at the first and last  
characters in an inline's text.



Cheers,

Andreas



Re: Leading/trailing space removal in LineLM

2005-11-01 Thread Simon Pepping
On Tue, Nov 01, 2005 at 11:40:42PM +0800, Manuel Mall wrote:
 This is probably a question for Luca or Simon.
 
 In LineLM we have this code:
 // ignore KnuthGlue and KnuthPenalty objects
 // at the beginning of the line
 seqIterator = seq.listIterator(iStartElement);
 tempElement = (KnuthElement) seqIterator.next();
 while (!tempElement.isBox()  seqIterator.hasNext()) {
 tempElement = (KnuthElement) seqIterator.next();
 iStartElement++;
 }
 What is the background to this? This seems to interfere with certain 
 combinations of white-space-collapse=false and 
 white-space-treatment=preserve/ignore-if-before-linefeed. I think 
 there is similar code to remove trailing stuff with similar 
 interference.

Glue and penalty items are removed at the start of a line. This is
part of the Knuth algorithm. It does not touch the matter of
white-space-collapse. If there is whitespace that may not be
removed/collapsed at the start of the line, it must be protected by a
preceding zero-width box. I.o.w., the value of white-space-collapse
needs to be taken into account at the phase of getNextKnuthElements.

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Unicode compliant Line Breaking

2005-11-01 Thread Simon Pepping
On Mon, Oct 31, 2005 at 03:25:12PM +0800, Manuel Mall wrote:
 In a previous post Joerg pointed to the Unicode Standard Annex #14 on 
 Line Breaking (http://www.unicode.org/reports/tr14/) and his initial 
 implementation: http://people.apache.org/~pietsch/linebreak.tar.gz.
 
 I had since a closer look at both UAX#14 and Joerg's code. Because I 
 liked what I saw I went about adapting Joerg's code it to Unicode 4.1 
 and added fairly extensive JUnit test cases to it mainly because it 
 really helps to go through the various different cases mentioned in the 
 spec in some structured fashion.

Is our current hyphenation method a subset of Unicode's method?

 Assuming now that this will be agreed as well the next step would be the 
 more detailed design of the integration. But this is well beyond the 
 scope of this e-mail as there are some tricky issues involved and they 
 probably need to be tackled in conjunction with the white space 
 handling issues. Many of the problems are related to our LayoutManager 
 structures which create barriers when it comes to the need to process 
 character sequences across those boundaries as is the case for both 
 line breaking and white space handling. Add to that the design of the 

I seem to recall that the hyphenation code collects words across LM
boundaries.

It seems a useful goal to implement Unicode hyphenation. But since it
is a major effort, it does not fit in working towards a release. In
any case it would have to be in a separate branch until it proves to
work and to implement a substantial part of hyphenation. Then it does
not immediately matter if it is a separate project or a part of FOP.

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Unicode compliant Line Breaking

2005-11-01 Thread J.Pietschmann

Simon Pepping wrote:

Is our current hyphenation method a subset of Unicode's method?


Umm. What's the relation between hyphenation and TR14 (except for
handling soft hyphens)? I guess you confuse finding line breaks
in general and line breaking due to hyphenation.


I seem to recall that the hyphenation code collects words across LM
boundaries.


As it should. Word boundaries and FO boundaries are different things:
 blockA wwrapper text-decoration=underlineo/wrapperrd/block

J.Pietschmann



Re: zero width space

2005-11-01 Thread Manuel Mall
On Wed, 2 Nov 2005 01:03 am, Chris Bowditch wrote:
 Manuel Mall wrote:
  Currently if one puts a zero-width-space (U+200B) into an XSL-FO
  file (or specifies linefeed-treatment=treat-as-zero-width-space)
  it is rendered as a missing character in PDF. Is that correct,
  i.e. does this character have to exist in the font used or should
  the formatter or renderer simply remove this character? It is the
  second approach that both AntennaHouse and RenderX appear to have
  chosen.

 I recommend that no character is output for a ZWS. The whole purpose
 of placing a ZWS into the input XSL-FO is to give layout an extra
 break opportunity, without changing the appearance of the generated
 document.

That seems to be the consensus, that is consider ZWS for line breaking 
but then discard and don't give it to the renderers.

Are there any other (unusual Unicode) characters which fall in the same 
category that is they influence layout decisions but should not be seen 
by the renderers?

 Chris

Manuel