Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
On Monday 06 February 2006 18:44, Luca Furini wrote: Manuel Mall wrote: snip/ 1. Justified text: pen INF + elastic glue 2. All other justification modes: either just a box of the width of the space or pen INF + fixed width glue. I think in both cases (justified / unjustified text) we could use either a sequence with only glues and penalties, or a sequence with boxes too. For the justified text, it could be: box w=0 + pen INF + elastic glue The choice of the sequence (completely suppressible / with boxes too) depends on the suppress-at-line-break property, whose default value is auto, meaning that only the normal U+0020 space is suppressed at a break. However, things are not so simple, and maybe we cannot just check the local value of the property. I see a couple of potentially-problematic situations. snip/ Luca, IMO nbsp (and any other Unicode special spaces) are outside the scope of XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. Therefore nbsp does not need to be considered when looking at white-space-treatment and white-space-collapse. Would that approach remove the complications you mentioned? If nbsps must be suppressed, should an empty line be created or not? WDYT? Regards Luca Cheers Manuel
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
Manuel Mall wrote: IMO nbsp (and any other Unicode special spaces) are outside the scope of XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. Therefore nbsp does not need to be considered when looking at white-space-treatment and white-space-collapse. Would that approach remove the complications you mentioned? Thanks for the clarification, Manuel! This solves the first supposed problem (interaction between nbsp and pretty-printing spaces), but the second one is still open: what happens if we have someContentnbspspaceotherContent ? *IF* (and I'm not at all sure about this) there can be a break , then both spaces should be discarded: in order to implement the correct behaviour for this almost hypothetical situation, we would need to create elements for both spaces as a whole (and thay could belong to different LMs) otherwise the algorithm would not be able to ignore the nbsp during the line breaking. Anyway I think this is quite an unlikely combination of entities and properties :-) ; as I see you are already working on something else, for the moment I will prepare a patch for the most common situations. Regards Luca
Re: [Xmlgraphics-fop Wiki] Update of PagePositionLast by JeremiasMaerki
I think it can be even simpler and would work with XSL 1.0: Place an id attribute in a side-region of the simple-page-master for the last page and you can reference that id using page-number-citation. Precondition: page-position=last must be implemented. :-) On 06.02.2006 15:54:15 Luca Furini wrote: Jeremias Maerki wrote: A problem surfacing with the first expectation is the page x of y problem: The usual empty block with an EOF-ID at the end of all content in the fo:flow ends up on the next-to-last page which causes the last page to display page n of n-1. Either the breaker has to detect such an element and force it on the last page or a different approach has to be taken to place the EOF marker. Just a quick untested idea: what about adding a way get the page number of the last page without the need to add a marked block and a page-number-citation? ... wait, we can even avoid inventing something new: the 1.1 specs define a fo:page-number-citation-last element! If we don't have any more a block that must be necessarily placed in the last page, we could use the width of the finishing glue (that could be negative too) in the ending elements added by the LineLM, in order to handle the difference between non-last page height and last page height. If the last-page BPD is bigger, the width will be 0, in other words the last, forced page break has a discount on the content elements width; if the last-page BPD is smaller, the width will be 0, which means that we build a page with the same height but some reserved space, unavailable for the content areas. Regards Luca Jeremias Maerki
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
On Monday 06 February 2006 22:35, Luca Furini wrote: Manuel Mall wrote: IMO nbsp (and any other Unicode special spaces) are outside the scope of XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. Therefore nbsp does not need to be considered when looking at white-space-treatment and white-space-collapse. Would that approach remove the complications you mentioned? Thanks for the clarification, Manuel! This solves the first supposed problem (interaction between nbsp and pretty-printing spaces), but the second one is still open: what happens if we have someContentnbspspaceotherContent ? *IF* (and I'm not at all sure about this) there can be a break , then both spaces should be discarded: IMO yes there can be a break and no only the space needs to be removed. Again the argument is that nbsp is not whitespace as per XSL-FO definition and need not to be removed. What makes you think that both the nbsp and the space needs to be removed around a fop generated linebreak? in order to implement the correct behaviour for this almost hypothetical situation, we would need to create elements for both spaces as a whole (and thay could belong to different LMs) otherwise the algorithm would not be able to ignore the nbsp during the line breaking. Anyway I think this is quite an unlikely combination of entities and properties :-) ; as I see you are already working on something else, for the moment I will prepare a patch for the most common situations. Regards Luca Manuel
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
Manuel Mall wrote: This solves the first supposed problem (interaction between nbsp and pretty-printing spaces), but the second one is still open: what happens if we have someContentnbspspaceotherContent ? *IF* (and I'm not at all sure about this) there can be a break , then both spaces should be discarded: IMO yes there can be a break and no only the space needs to be removed. Again the argument is that nbsp is not whitespace as per XSL-FO definition and need not to be removed. What makes you think that both the nbsp and the space needs to be removed around a fop generated linebreak? Oops, I forgot to add an importand condition: if the user explicitly states that the nsbp must be discarded around a line break: fo:inline suppress-at-line-break=suppressnbsp;/fo:inline Well, the more I look at this, the more it seems unlikely to ever happen ... we are probably having a highly theoretical disquisition! :-) Anyway, I was still not sure whether there could be a break so I looked back at the Unicode Annex #14. GL Non-breaking (Glue) (XB/XA) (normative) Non-breaking characters prohibit breaks on either side, but that prohibition can be overridden by SP or ZW. In particular, when NBSP follows SPACE, there is a break opportunity after the SPACE and NBSP will go as visible space onto the next line. See also WJ. The following lists the characters of line break class GL with additional description. 00A0 NO-BREAK SPACE (NBSP) 202F NARROW NO-BREAK SPACE (NNBSP) 180E MONGOLIAN VOWEL SEPARATOR (MVS) NO-BREAK SPACE is the preferred character to use where two words should be visually separated but kept on the same line, as in the case of a title and a name Dr.NBSPJoseph Becker. When SPACE follows NBSP, there is no break, because there never is a break in front of SPACE. NARROW NO-BREAK SPACE is used in Mongolian. The mongolian vowel separator acts like a NNBSP in its line breaking behavior. It additionally affects the shaping of certain vowel characters as described in [Unicode] Section 12.3, Mongolian. So, it seems there could be a break between SPACE and NBSP (with NBSP starting the next line), but not between NBSP and SPACE. Can we say this is settled? Regards Luca
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
On Feb 6, 2006, at 08:17, Manuel Mall wrote: [ME:] snip/ A preserved carriage return can be treated the same way as a linefeed, under the very exceptional condition that it survives white- space handling: * white-space-treatment=ignore-if-* * the CR does not follow/precede a linefeed * it is the first character in a sequence of whitespace, so it survives white-space-collapse Shouldn't a CR always survive whitespace handling? Not always: If white-space-treatment=preserve then any XML whitespace other than a linefeed is converted into a normal space. IMO, the editors put it this way because of the possibility of Windows-specific line- endings, where a linefeed is followed by a CR. For a starters it is fairly difficult to get a CR out of a XML parser. Difficult? It's simply a characters event, just like any other... Only if the CR is hidden in an entity reference can it survive. Also, as Simon pointed out in some other contribution, whitespace handling is designed to deal with pretty printing and readable XML layout introduced whitespace. A CR preserved by the XML parser certainly does not fall into that category. Oh yes it does... Remember that not all our users are unix/linux- based, which means for Windows users, you're likely to get the sequence '#x0A;#x0D;' as line-terminator, while Mac-users saving a source file with native line-endings will simply get a '#x0D;'. (UTF-8 encoding is recommended, but not enforced... An XML file can be any encoding the parser supports on top of the UTF-8 minimum.) A carriage-return can survive white-space-handling, for instance, in the following case (suppose Mac-encoding): fo:block First line, then a CR#x0D; some spaces, and more text /fo:block The CR (which isn't necessarily a Numerical Character Reference, but could be just the byte '0D') is not converted into a space (white- space-treatment=ignore-if-surrounding-linefeed). It does not precede or follow a linefeed. It is the first character in a sequence of whitespace, so no matter what the value of white-space-collapse, it will survive... I am also not aware that the XSL-FO spec mentions CR as falling under whitespace. IMO for whitespace handling CR is just a non whitespace character. Nope, it does fall into the category of XML whitespace. There are exactly four of those: #x09; (tab), #x0A; (linefeed), #x0D; (carriage-return) and #x20; (space). If you don't believe me, it's indeed not in the XSL-FO Rec, but you might want to check the XML Recommendation... So, we only need to consider what fop layout should do if it encounters a CR. I would say, keep it simple, throw it away and log a warning. Now, what about a tab character under the same circumstances? Do we use an elastic width of X spaces optimum, where X is purely conventional? Similar considerations as for CR apply to TAB. ... Cheers, Andreas
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
On Feb 6, 2006, at 17:04, Luca Furini wrote: Hi Manuel / Luca, Manuel Mall wrote: IMO yes there can be a break and no only the space needs to be removed. Again the argument is that nbsp is not whitespace as per XSL-FO definition and need not to be removed. What makes you think that both the nbsp and the space needs to be removed around a fop generated linebreak? Oops, I forgot to add an importand condition: if the user explicitly states that the nsbp must be discarded around a line break: fo:inline suppress-at-line-break=suppressnbsp;/fo:inline Oops, typo? suppress-at-line-break is a non-inherited property, only applicable to fo:character :-) Well, the more I look at this, the more it seems unlikely to ever happen ... we are probably having a highly theoretical disquisition! :-) fo:character character=#xA0; suppress-at-line-break=suppress / followed by a space is indeed very theoretical. So is (another alternative): fo:inline suppress-at-line-break=suppress fo:character character=#xA0; suppress-at-line-break=inherit / /fo:inline OTOH, if we can make the algorithm work in these exotic cases, then the commonly used scenarios will be a cake-walk. :-) This does, in any case, shed some different light on the notion of 'pretty printing whitespace', since currently --at least that was my understanding of the discussions, and that's what I worked towards-- a fo:character is considered the same as a regular character, in that fo:characters representing XML whitespace are subject to whitespace- removal... Yet, one can arguably defend the idea that any *fo:*character is inserted for *XML* pretty printing purposes, no? Should this change be reverted then? [Maybe partly, because suppose: fo:block fo:character character=#x20; suppress-at-line-break=retain / ... Currently, the fact that it is a fo:character is not known when running this through the algorithm. The CharIterators deal with the characters. The XMLWhiteSpaceHandler makes a decision based purely on the value of the character property. It is agnostic to the suppress- at-line-break property's value... I myself would tend to use a non- breaking space in this case, since it escapes the whitespace handling, but it is a theoretical possibility. :-) Another alternative would be to introduce a member to the CharIterators... Something like isSuppressible(), which would return true if: ( the current element is a regular character and it has codepoint U+0020 ) or ( the current element is a fo:character and (( the value of its character property is codepoint U+0020 and suppress-at-line-break=auto ) or ( suppress-at-line-break=suppress )) As such, refinement (white-space)-character-removal could operate on this basis, and already resolve such issues at that stage. The current approach is still not 100% correct anyway...] Anyway, I was still not sure whether there could be a break so I looked back at the Unicode Annex #14. snip / So, it seems there could be a break between SPACE and NBSP (with NBSP starting the next line), but not between NBSP and SPACE. Can we say this is settled? Yes! Definitely. We're looking for UAX#14 'compliance' as well here. My 2 cents. Cheers, Andreas
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
On Feb 6, 2006, at 19:40, Andreas L Delmelle wrote: Currently, the fact that it is a fo:character is not known when running this through the algorithm. The CharIterators deal with the characters. Say... I was just wondering: why does the TextLayoutManager create its own copy of the FOText's character array? Could the LMs be made to re-use the CharIterators' functionality to get to the characters, or would that mean a draw on performance somehow? Anyone? Cheers, Andreas
Re: [Xmlgraphics-fop Wiki] Update of PagePositionLast by JeremiasMaerki
Well, I don't see any restriction concerning static-content in the spec. You simply have to make sure that an FO with an id only appears once in a document. And that is the case if you place it in a static-content that is used exactly once in a page-master activated by page-position=last. Hey, maybe I got it all wrong. Wouldn't be the first time. :-) Anyway, I tested what I described with a certain commercial FO implementation and it worked. On 06.02.2006 21:41:52 Andreas L Delmelle wrote: On Feb 6, 2006, at 21:37, J.Pietschmann wrote: Jeremias Maerki wrote: I think it can be even simpler and would work with XSL 1.0: Place an id attribute in a side-region of the simple-page-master for the last page and you can reference that id using page-number-citation. Is it really legal to refer to ids in static content? What's the expected page number in case the static content is used on more than one page? Wow! I'm first in line to hear/read the answer to this one... :-) Jeremias Maerki