Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Manuel Mall
On Monday 06 February 2006 18:44, Luca Furini wrote:
 Manuel Mall wrote:
snip/
 
  1. Justified text: pen INF + elastic glue
  2. All other justification modes: either just a box of the width of
  the space or pen INF + fixed width glue.

 I think in both cases (justified / unjustified text) we could use
 either a sequence with only glues and penalties, or a sequence with
 boxes too.

 For the justified text, it could be:
box w=0 + pen INF + elastic glue

 The choice of the sequence (completely suppressible / with boxes too)
 depends on the suppress-at-line-break property, whose default value
 is auto, meaning that only the normal U+0020 space is suppressed at
 a break.

 However, things are not so simple, and maybe we cannot just check the
 local value of the property. I see a couple of
 potentially-problematic situations.

snip/

Luca,

IMO nbsp (and any other Unicode special spaces) are outside the scope of 
XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in 
XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. 
Therefore nbsp does not need to be considered when looking at 
white-space-treatment and white-space-collapse. Would that approach 
remove the complications you mentioned?


 If nbsps must be suppressed, should an empty line be created or not?

 WDYT?

 Regards
  Luca

Cheers

Manuel


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Luca Furini

Manuel Mall wrote:

IMO nbsp (and any other Unicode special spaces) are outside the scope of 
XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in 
XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. 
Therefore nbsp does not need to be considered when looking at 
white-space-treatment and white-space-collapse. Would that approach 
remove the complications you mentioned?


Thanks for the clarification, Manuel!

This solves the first supposed problem (interaction between nbsp and 
pretty-printing spaces), but the second one is still open: what happens if 
we have

  someContentnbspspaceotherContent ?
*IF* (and I'm not at all sure about this) there can be a break , then both 
spaces should be discarded: in order to implement the correct behaviour 
for this almost hypothetical situation, we would need to create elements 
for both spaces as a whole (and thay could belong to different LMs) 
otherwise the algorithm would not be able to ignore the nbsp during the 
line breaking.


Anyway I think this is quite an unlikely combination of entities and 
properties :-) ; as I see you are already working on something else, for 
the moment I will prepare a patch for the most common situations.


Regards
Luca


Re: [Xmlgraphics-fop Wiki] Update of PagePositionLast by JeremiasMaerki

2006-02-06 Thread Jeremias Maerki
I think it can be even simpler and would work with XSL 1.0: Place an id
attribute in a side-region of the simple-page-master for the last page
and you can reference that id using page-number-citation. Precondition:
page-position=last must be implemented. :-)

On 06.02.2006 15:54:15 Luca Furini wrote:
 Jeremias Maerki wrote:
 
  A problem surfacing with the first expectation is the page x of y 
  problem: The usual empty block with an EOF-ID at the end of all 
  content in the fo:flow ends up on the next-to-last page which causes the 
  last page to display page n of n-1. Either the breaker has to detect 
  such an element and force it on the last page or a different approach 
  has to be taken to place the EOF marker.
 
 Just a quick untested idea: what about adding a way get the page number of 
 the last page without the need to add a marked block and a 
 page-number-citation? ... wait, we can even avoid inventing something new: 
 the 1.1 specs define a fo:page-number-citation-last element!
 
 If we don't have any more a block that must be necessarily placed in the 
 last page, we could use the width of the finishing glue (that could be 
 negative too) in the ending elements added by the LineLM, in order to 
 handle the difference between non-last page height and last page height.
 
 If the last-page BPD is bigger, the width will be  0, in other words the 
 last, forced page break has a discount on the content elements width; if 
 the last-page BPD is smaller, the width will be  0, which means that we 
 build a page with the same height but some reserved space, unavailable 
 for the content areas.
 
 Regards
  Luca



Jeremias Maerki



Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Manuel Mall
On Monday 06 February 2006 22:35, Luca Furini wrote:
 Manuel Mall wrote:
  IMO nbsp (and any other Unicode special spaces) are outside the
  scope of XSL-FO whitespace handling. XSL-FO refers to whitespace as
  defined in XML. In XML only x#20, x#9, x#a, and x#d are considered
  whitespace. Therefore nbsp does not need to be considered when
  looking at white-space-treatment and white-space-collapse. Would
  that approach remove the complications you mentioned?

 Thanks for the clarification, Manuel!

 This solves the first supposed problem (interaction between nbsp and
 pretty-printing spaces), but the second one is still open: what
 happens if we have
someContentnbspspaceotherContent ?
 *IF* (and I'm not at all sure about this) there can be a break , then
 both spaces should be discarded: 

IMO yes there can be a break and no only the space needs to be removed. 
Again the argument is that nbsp is not whitespace as per XSL-FO 
definition and need not to be removed.

What makes you think that both the nbsp and the space needs to be 
removed around a fop generated linebreak?

 in order to implement the correct 
 behaviour for this almost hypothetical situation, we would need to
 create elements for both spaces as a whole (and thay could belong to
 different LMs) otherwise the algorithm would not be able to ignore
 the nbsp during the line breaking.

 Anyway I think this is quite an unlikely combination of entities and
 properties :-) ; as I see you are already working on something else,
 for the moment I will prepare a patch for the most common situations.

 Regards
  Luca

Manuel


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Luca Furini

Manuel Mall wrote:


This solves the first supposed problem (interaction between nbsp and
pretty-printing spaces), but the second one is still open: what
happens if we have
   someContentnbspspaceotherContent ?
*IF* (and I'm not at all sure about this) there can be a break , then
both spaces should be discarded: 


IMO yes there can be a break and no only the space needs to be removed. 
Again the argument is that nbsp is not whitespace as per XSL-FO 
definition and need not to be removed.


What makes you think that both the nbsp and the space needs to be 
removed around a fop generated linebreak?


Oops, I forgot to add an importand condition: if the user explicitly 
states that the nsbp must be discarded around a line break:

  fo:inline suppress-at-line-break=suppressnbsp;/fo:inline
Well, the more I look at this, the more it seems unlikely to ever happen 
... we are probably having a highly theoretical disquisition! :-)


Anyway, I was still not sure whether there could be a break so I looked 
back at the Unicode Annex #14.



GL  Non-breaking (Glue) (XB/XA)  (normative)

Non-breaking characters prohibit breaks on either side, but that 
prohibition can be overridden by SP or ZW. In particular, when NBSP 
follows SPACE, there is a break opportunity after the SPACE and NBSP will 
go as visible space onto the next line. See also WJ. The following lists 
the characters of line break class GL with additional description.


00A0 NO-BREAK SPACE (NBSP)
202F NARROW NO-BREAK SPACE (NNBSP)
180E MONGOLIAN VOWEL SEPARATOR (MVS)

NO-BREAK SPACE is the preferred character to use where two words should be 
visually separated but kept on the same line, as in the case of a title 
and a name Dr.NBSPJoseph Becker. When SPACE follows NBSP, there is no 
break, because there never is a break in front of SPACE.  NARROW NO-BREAK 
SPACE is used in Mongolian. The mongolian vowel separator acts like a 
NNBSP in its line breaking behavior. It additionally affects the shaping 
of certain vowel characters as described in [Unicode] Section 12.3, 
Mongolian.



So, it seems there could be a break between SPACE and NBSP (with NBSP 
starting the next line), but not between NBSP and SPACE. Can we say this 
is settled?


Regards
Luca


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Andreas L Delmelle

On Feb 6, 2006, at 08:17, Manuel Mall wrote:


[ME:]

snip/

A preserved carriage return can be treated the same way as a
linefeed, under the very exceptional condition that it survives  
white-

space handling:
  * white-space-treatment=ignore-if-*
  * the CR does not follow/precede a linefeed
  * it is the first character in a sequence of whitespace, so
it survives white-space-collapse



Shouldn't a CR always survive whitespace handling?


Not always:
If white-space-treatment=preserve then any XML whitespace other  
than a linefeed is converted into a normal space. IMO, the editors  
put it this way because of the possibility of Windows-specific line- 
endings, where a linefeed is followed by a CR.



For a starters it is fairly difficult to get a CR out of a XML parser.


Difficult? It's simply a characters event, just like any other...


Only if the CR is hidden in an entity reference can it survive.
Also, as Simon pointed out in some other contribution, whitespace  
handling
is designed to deal with pretty printing and readable XML layout  
introduced
whitespace. A CR preserved by the XML parser certainly does not  
fall into

that category.


Oh yes it does... Remember that not all our users are unix/linux- 
based, which means for Windows users, you're likely to get the  
sequence '#x0A;#x0D;' as line-terminator, while Mac-users saving a  
source file with native line-endings will simply get a '#x0D;'.  
(UTF-8 encoding is recommended, but not enforced... An XML file can  
be any encoding the parser supports on top of the UTF-8 minimum.)


A carriage-return can survive white-space-handling, for instance, in  
the following case (suppose Mac-encoding):


fo:block
  First line, then a CR#x0D; some spaces, and more text
/fo:block

The CR (which isn't necessarily a Numerical Character Reference, but  
could be just the byte '0D') is not converted into a space (white- 
space-treatment=ignore-if-surrounding-linefeed).

It does not precede or follow a linefeed.
It is the first character in a sequence of whitespace, so no matter  
what the value of white-space-collapse, it will survive...


I am also not aware that the XSL-FO spec mentions CR as falling  
under whitespace. IMO

for whitespace handling CR is just a non whitespace character.


Nope, it does fall into the category of XML whitespace. There are  
exactly four of those: #x09; (tab), #x0A; (linefeed), #x0D;  
(carriage-return) and #x20; (space). If you don't believe me, it's  
indeed not in the XSL-FO Rec, but you might want to check the XML  
Recommendation...


So, we only need to consider what fop layout should do if it  
encounters a

CR. I would say, keep it simple, throw it away and log a warning.


Now, what about a tab character under the same circumstances? Do we
use an elastic width of X spaces optimum, where X is purely
conventional?



Similar considerations as for CR apply to TAB.


...

Cheers,

Andreas


Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Andreas L Delmelle

On Feb 6, 2006, at 17:04, Luca Furini wrote:

Hi Manuel / Luca,


Manuel Mall wrote:


IMO yes there can be a break and no only the space needs to be  
removed. Again the argument is that nbsp is not whitespace as per  
XSL-FO definition and need not to be removed.


What makes you think that both the nbsp and the space needs to be  
removed around a fop generated linebreak?


Oops, I forgot to add an importand condition: if the user  
explicitly states that the nsbp must be discarded around a line break:

  fo:inline suppress-at-line-break=suppressnbsp;/fo:inline


Oops, typo? suppress-at-line-break is a non-inherited property, only  
applicable to fo:character :-)


Well, the more I look at this, the more it seems unlikely to ever  
happen ... we are probably having a highly theoretical  
disquisition! :-)


fo:character character=#xA0; suppress-at-line-break=suppress /

followed by a space is indeed very theoretical.

So is (another alternative):

fo:inline suppress-at-line-break=suppress
  fo:character character=#xA0;
suppress-at-line-break=inherit / /fo:inline

OTOH, if we can make the algorithm work in these exotic cases, then  
the commonly used scenarios will be a cake-walk. :-)


This does, in any case, shed some different light on the notion of  
'pretty printing whitespace', since currently --at least that was my  
understanding of the discussions, and that's what I worked towards--  
a fo:character is considered the same as a regular character, in that  
fo:characters representing XML whitespace are subject to whitespace- 
removal... Yet, one can arguably defend the idea that any  
*fo:*character is inserted for *XML* pretty printing purposes, no?  
Should this change be reverted then?

[Maybe partly, because suppose:

fo:block
  fo:character character=#x20; suppress-at-line-break=retain /
...

Currently, the fact that it is a fo:character is not known when  
running this through the algorithm. The CharIterators deal with the  
characters. The XMLWhiteSpaceHandler makes a decision based purely on  
the value of the character property. It is agnostic to the suppress- 
at-line-break property's value... I myself would tend to use a non- 
breaking space in this case, since it escapes the whitespace  
handling, but it is a theoretical possibility. :-)


Another alternative would be to introduce a member to the  
CharIterators...

Something like isSuppressible(), which would return true if:
( the current element is a regular character
  and it has codepoint U+0020 )
or ( the current element is a fo:character
  and
  (( the value of its character property is codepoint U+0020
and suppress-at-line-break=auto )
  or ( suppress-at-line-break=suppress ))

As such, refinement (white-space)-character-removal could operate on  
this basis, and already resolve such issues at that stage.


The current approach is still not 100% correct anyway...]

Anyway, I was still not sure whether there could be a break so I  
looked back at the Unicode Annex #14.

snip /
So, it seems there could be a break between SPACE and NBSP (with  
NBSP starting the next line), but not between NBSP and SPACE. Can  
we say this is settled?


Yes! Definitely. We're looking for UAX#14 'compliance' as well here.

My 2 cents.

Cheers,

Andreas



Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

2006-02-06 Thread Andreas L Delmelle

On Feb 6, 2006, at 19:40, Andreas L Delmelle wrote:

Currently, the fact that it is a fo:character is not known when  
running this through the algorithm. The CharIterators deal with the  
characters.


Say... I was just wondering: why does the TextLayoutManager create  
its own copy of the FOText's character array? Could the LMs be made  
to re-use the CharIterators' functionality to get to the characters,  
or would that mean a draw on performance somehow?


Anyone?

Cheers,

Andreas



Re: [Xmlgraphics-fop Wiki] Update of PagePositionLast by JeremiasMaerki

2006-02-06 Thread Jeremias Maerki
Well, I don't see any restriction concerning static-content in the spec.
You simply have to make sure that an FO with an id only appears once in
a document. And that is the case if you place it in a static-content
that is used exactly once in a page-master activated by
page-position=last. Hey, maybe I got it all wrong. Wouldn't be the
first time. :-) Anyway, I tested what I described with a certain
commercial FO implementation and it worked.

On 06.02.2006 21:41:52 Andreas L Delmelle wrote:
 On Feb 6, 2006, at 21:37, J.Pietschmann wrote:
 
  Jeremias Maerki wrote:
  I think it can be even simpler and would work with XSL 1.0: Place  
  an id
  attribute in a side-region of the simple-page-master for the last  
  page
  and you can reference that id using page-number-citation.
 
  Is it really legal to refer to ids in static content? What's
  the expected page number in case the static content is used
  on more than one page?
 
 Wow! I'm first in line to hear/read the answer to this one... :-)


Jeremias Maerki