Line breaks and other typographical stuff (was: Re: Latest FOP schema)

2002-05-14 Thread Joerg Pietschmann

Self-followup:

 Peter B. West wrote:
  These cover such categories as
  Case, Numeric Value, Dashes, Line Breaking and Spaces.

I found them online, the relevant URLs appear to be
 http://www.unicode.org/Public/UNIDATA/LineBreak.txt
 http://www.unicode.org/Public/UNIDATA/extracted/DerivedLineBreak.txt
and for the interpretation of the codes
 http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt
(the lb section)
I still think this area is somewhat unintuitive to browse.
Does somebody know where there is a more elaborate explanation
of the values used there, in particular whether there is a
formal description how they are supposed to influence the
actual line breaking? I don't want to rely on intuition here,
it fails me much to often...

 Slightly related question: FOP appears to render the U+00A0 non
 breaking space always at full space width. Shouldn't the space
 also be used for justification purposes? There are, after all,
 non breaking spaces with a definite width available.

Ooops, major blunder. I should check before posting. While
there is a variety of spaces at U+2000 and following code
points, as well as various additional spaces for some scripts,
there is only the common U+00A0 non-breaking space, U+2007
figure space (whatever this is) and U+202F narrow non-breaking
space available. This begs the question: how should arbitrary
non-breaking spaces be expressed in XSLFO, and how often does
this issue arise? I vaguely remember that the most often arising
use case in common engliish was the space after an abbreviated
title, and this is only available for space justification at the
same level of fine tuning as character spacing (and it should be
a slightly less wide than a full width space).

Well, if we are at this, another typographical nastyness which
comes to mind is an indented initial. This bothers me for quite
some time now: How should this be expressed in XSLFO? In HTML, a
floating table around the letter can be used, but  this seems
awkward and does not account for fine tuning like the outdent to
account for serifs. Also, the automatic displacement of the next
lines could be a problem. I think there is also a float necessary
in XSLFO, perhaps with some adjustments to the width and with
relative positioning for fine tuning.

J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




RE: Line breaks and other typographical stuff (was: Re: Latest FOP schema)

2002-05-14 Thread Arved Sandstrom

 -Original Message-
 From: Joerg Pietschmann [mailto:[EMAIL PROTECTED]]
 Sent: May 14, 2002 7:52 AM
 To: FOP Dev
 Subject: Line breaks and other typographical stuff (was: Re: Latest FOP
 schema)

 I found them online, the relevant URLs appear to be
  http://www.unicode.org/Public/UNIDATA/LineBreak.txt
  http://www.unicode.org/Public/UNIDATA/extracted/DerivedLineBreak.txt
 and for the interpretation of the codes
  http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt
 (the lb section)
 I still think this area is somewhat unintuitive to browse.
 Does somebody know where there is a more elaborate explanation
 of the values used there, in particular whether there is a
 formal description how they are supposed to influence the
 actual line breaking? I don't want to rely on intuition here,
 it fails me much to often...

This would not be covered in UTR 14, Line Breaking Properties?
(http://www.unicode.org/unicode/reports/tr14/).

Arved


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




RE: Line breaks and other typographical stuff (was: Re: Latest FOP schema)

2002-05-14 Thread Arved Sandstrom

 -Original Message-
 From: Joerg Pietschmann [mailto:[EMAIL PROTECTED]]
 Sent: May 14, 2002 7:52 AM
 To: FOP Dev
 Subject: Line breaks and other typographical stuff (was: Re: Latest FOP
 schema)
 
 Well, if we are at this, another typographical nastyness which
 comes to mind is an indented initial. This bothers me for quite
 some time now: How should this be expressed in XSLFO? In HTML, a
 floating table around the letter can be used, but  this seems
 awkward and does not account for fine tuning like the outdent to
 account for serifs. Also, the automatic displacement of the next
 lines could be a problem. I think there is also a float necessary
 in XSLFO, perhaps with some adjustments to the width and with
 relative positioning for fine tuning.

text-indent. If that's not it, what do you mean by indented initial?

Regards,
Arved


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: Line breaks and other typographical stuff (was: Re: Latest FOP schema)

2002-05-14 Thread Patrick Andries



J.Pietschmann wrote:

 Patrick Andries wrote:

 This begs the question: how should arbitrary
 non-breaking spaces be expressed in XSLFO, and how often does
 this issue arise? 

 Well, in fine French typography, this occurs often. Semicolon, 
 question marks and exclanation marks, for instance, should be 
 preceded by a fine non-breaking space while colon and closing 
 guillemet ( ») should be preceded by a larger non-breaking space. I 
 believe Unicode does not distinguish between these two cases, its 
 customary answer would be that this is a higher protocol's duty : 
 Unicode only marks a semantic function (non-breaking space) not its 
 appearance. In other words, it's FO's problem ?


 This is my understanding. There is already a certain proliferation
 of spaces, and the Unicoders quite explicitely stated they feel
 mainly responsible for stuff resulting in glyphs


Well, they feel responsible for characters. Glyphs is not Unicode's 
realm, they will off course turn out as such (most of  them in any case: 
some are invisible, such as language tags).

 and want to
 support control characters, separators and spaces only to the
 extent necessary for compatibility and to deal with the cases
 which arise most often. I think they would like if there were
 a fo:word :-)

Not sure, we actually discussed this topic on the Unicode internal list 
(sentence and word boundaries) . Word is a very language-specific thing: 
even French and English don't use the same typographical conventions to 
delimit them. Never mind languages like Thai where spaces do not 
separate words. Also I do not know off hand of any Unicode rendering 
algorithm that needs to know what a word is (sorting does), determing 
what is a paragraph (fo:block), however, is essential for 
bidirectional rendering.


 Well, arbitrary spaces can be achived in XSLFO by using space-start
 on a fo:inline. Attaching a keep-with-previous=always should make
 it non-breaking in my understanding, for example
  fo:blockMr. fo:inline space-start=0.4en
keep-with-previous=alwaysBean/fo:inlinefo:block
 Seems to be an awful lot to write for a sort of fine tuning effect
 (most readers wouldn't appreciate it, or even take notice).

This insertion should be done at another level (in French the 
non-breaking width variation is pretty deterministic, an XSLT stylesheet 
could do it I suppose).

 Well, current FOP ignores both properties on fo:inline anyway. 

(Already mentioned on an different list : I would really love to have an 
idea how things are progressing. It would be good for this to be posted 
regularly in a succinct fashion on the Web site. 
http://xml.apache.org/fop/design/status.html is very laconic. Difficult 
to plan any action on this base.)



 Ah, again:
  fo:blockMozifo:inline space-start=0.4enlla/fo:inlinefo:block
 hopefully never breaks in the middle unless permitted by hyphenation,
 it's just a word with a gap. Is this correct? 

Intuitively, yes.

P. Andries

Tout Unicode en français
--- http://hapax.iquebec.com --





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




RE: Line breaks and other typographical stuff (was: Re: Latest FOP schema)

2002-05-14 Thread Arved Sandstrom

A drop cap, in other words. :-)

 -Original Message-
 From: J.Pietschmann [mailto:[EMAIL PROTECTED]]
 Sent: May 14, 2002 4:47 PM
 To: [EMAIL PROTECTED]
 Subject: Re: Line breaks and other typographical stuff (was: Re: Latest
 FOP schema)


 Arved Sandstrom wrote:
  text-indent. If that's not it, what do you mean by indented initial?

 I meant what the following HTML snippet shows:
 table width=300
tr
  td
table align=left cellpadding=0px
 cellspacing=0px border=0px
  trtdspan style=font-size: 200%; L/span/td/tr
/table
porem ipsum dolor sit amet. Consectetuer adipiscing elit, sed
  diam nonummy. Nibh euismod tincidunt ut laoreet dolore magna.
  Aliquam erat volutpat. Iriure dolor in/p
  /td
/tr
  /table

 J.Pietschmann


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




AW: AW: Latest FOP schema

2002-05-13 Thread J.U. Anderegg

J. Pietschmann wrote:

fo:block are

 Rectangular areas, perhaps indented and with border, padding
 and other individual traits, nested into a rectangular area.

I understand setting traits, properties. How about page layout, setting
inline and baseline postitions? Does it imply a unconditional CRLF?

What does the input below look look like on the page?

fo:block
level_0_text fills to position A
fo:block
level_1_text positioned at A fills to position B
/fo:block
more level_0_text positioned at B
/fo:block

Hansuli Anderegg



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




RE: AW: Latest FOP schema

2002-05-13 Thread Arved Sandstrom

Comments intermingled.

 -Original Message-
 From: J.U. Anderegg [mailto:[EMAIL PROTECTED]]
 Sent: May 13, 2002 5:15 AM
 To: [EMAIL PROTECTED]
 Subject: AW: AW: Latest FOP schema

 J. Pietschmann wrote:

 fo:block are

  Rectangular areas, perhaps indented and with border, padding
  and other individual traits, nested into a rectangular area.

 I understand setting traits, properties. How about page layout, setting
 inline and baseline postitions? Does it imply a unconditional CRLF?

It's not that there is a CRLF, or anything like it, after a block, but
rather that if it is succeeded by block-level siblings that they will be
stacked in the block-progression-direction, so the effect will be the same.

Can you be more specific with respect to the other questions?

 What does the input below look look like on the page?

 fo:block
   level_0_text fills to position A
   fo:block
   level_1_text positioned at A fills to position B
   /fo:block
   more level_0_text positioned at B
 /fo:block

I think the predominant opinion is (assume all of this fits on one page) -

a normal block area (generated by the outer block) that contains:

one or more line areas for level_0_text fills to position A;
then a block area with one or more line areas for level_1_text positioned
at A fills to position B;
finally more line areas for more level_0_text positioned at B.

Note that if your example had been

fo:block
level_0_text fills to position Afo:block
level_1_text positioned at A fills to position B
/fo:blockmore level_0_text positioned at B
/fo:block

then it would still be the same.

Regards,
AHS


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




RE: Latest FOP schema

2002-05-13 Thread Joerg Pietschmann

Arved Sandstrom Arved_37@ wrote:
 I think the predominant opinion is (assume all of this fits on one page) -
 
 a normal block area (generated by the outer block) that contains:
 
 one or more line areas for level_0_text fills to position A;
 then a block area with one or more line areas for level_1_text positioned
 at A fills to position B;
 finally more line areas for more level_0_text positioned at B.
 
 Note that if your example had been
 
 fo:block
 level_0_text fills to position Afo:block
 level_1_text positioned at A fills to position B
 /fo:blockmore level_0_text positioned at B
 /fo:block
 
 then it would still be the same.

As a side note, assuming western language and script and hyphenation
off, if the example had been

 fo:block
 level_0_text fills to position
  Afo:blocklevel_1_text positioned at A fills to position B
 /fo:blockmore level_0_text positioned at B
 /fo:block

it is probably illegal, according to 4.7.2, Point 3. I suppose
it would be illegal to have a line break within the word
 Alevel_1_text
here. The problem here is, where do I get the rules whether a line
break is permitted somewhere for a certain language and script? And
how is this supposed to deal with out of context stuff like product
numbers or artificial DB keys or programming language identifiers
containing underlines and dashes, and with non-breaking spaces, odd
symbols, and character abuse (uppercase greek omega instead of Ohm
sign)? Again, I suppose the burden has to be put on the user who
has to ensure everything is correct, including changing the current
language for quotes, nested if necessary, and specifying a language
for product numbers and programming language ids. Umm, something
looking like
  ..., ISBN fo:inline language=x-isbn0-201-48345-9/fo:inline...
and
  the fo:inline language=x-Javaorg.apache.fop.render.pdf.Fontfo:inline
 class implements the fo:inline
  language=x-Javaorg.apache.fop.layout.FontMetricfo:inline
 interface ...

This would eleminate some keep-together stuff, I guess, but most
probably requires a mechanism to teach the processor line breaking
rules for user defined languages.

DumbQuestions
- Is the interpretation reasonable? (I don't ask about correctness...:-)
- Can the redesigned FOP deal with the Alevel_1_text above, I mean
  will it raise an error or warning?
- Can/should FOP deal with user supplied word/line breaking rules?
/DumbQuestions

Note that the same applies to the recently heavily discussed problem
of a block level element inside an fo:inline, according to 4.7.3, in
particular point 3.

J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




AW: Latest FOP schema

2002-05-11 Thread J.U. Anderegg

From the external view block means a rectangle containing formatted text,
something like a paragraph.

o What do fo:blocks as children of fo:blocks: mean for the end user?
o What's teheffect of block's in combination with tag element TEXT like
leader, marker, inline, wrapper, basic-link?
o When is a block required?



Hansuli Anderegg




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: AW: Latest FOP schema

2002-05-11 Thread J.Pietschmann

J.U. Anderegg wrote:
From the external view block means a rectangle containing formatted text,
 something like a paragraph.
 
 o What do fo:blocks as children of fo:blocks: mean for the end user?

Rectangular areas, perhaps indented and with border, padding
and other individual traits, nested into a rectangular area.

A user might be tempted to see them as higher level structures,
like HTML DIV elements, or (nested) sections or whatever. That's
not too bad but can be very misleading at times (for example,
a headline probably has to be *mapped* to a fo:block too).

Nested fo:blocks can be used by the transformation designer for
pure technical reasons, for example to define certain properties
for a longer stretch of text, without any corespondence to the
structure of the original document.

 From this point of view, it has bee a very good idea to name a
fo:block a block and not a paragraph. In the same sense, fo:table
should probably have been named grid.

BTW: the list related FOs are redundant, aren't they? Or am I
missing something that can't be easily mapped to a table (grid)?

 o What's teheffect of block's in combination with tag element TEXT like
 leader, marker, inline, wrapper, basic-link?

There are some hassles with whitespaces. There is some similarity
in handling fo:leaders to handling whitespaces.

 o When is a block required?

If you want to put text where a block level FO is expected.

J.Pietschmann


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: Latest FOP schema

2002-05-10 Thread Chuck Paussa

Kieron wrote:
 This looks very good.
 I think we should put this somewhere on the site when it is ready.
That would be excellent Not quite done yet.


I missed the attributes in the spec that are only listed as applying to 
elements but are not referred to in the element description. I'm 
confused bt the description of some of these attributes.

background-position applies to block-level and replaced elements
(What are replaced elements?)

max-widthmin-heightmin-width  apply to all elements except 
non-replaced inline elements and table elements
(What are non-replaced inline elements?)

position applies to all elements, but not to generated content
(What is generated content?)

The following attributes are listed in the spec as applying to all 
elements.
(This makes sense)

background   border-bottom border-color
border-left  border-right  border-style
border-top   border-width  margin

page-break-after page-break-before apply to block-level elements, 
list-item, and table-row
(This makes sense)

Chuck Paussa


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: Latest FOP schema

2002-05-10 Thread J.Pietschmann

Chuck Paussa wrote:
 background-position applies to block-level and replaced elements
 (What are replaced elements?)

This seems to be an odd artefact of not having checked everything.
The background-position property is a shorthand for combinations
of the background-position-(horizontal|vertical) stuff, which in
turn apply, as noted there to everything to which background
applies, which is, ugh, every element (7.29.1). This all sucks,
you should report a spec bug to the editors (well, read the amendments,
it could already be there...)
I believe replaced meant floating elements and footnotes originally,
but the point is moot anyway.

 max-widthmin-heightmin-width  apply to all elements except 
 non-replaced inline elements and table elements
 (What are non-replaced inline elements?)

These properties are mapped to i-p-d or b-p-d, the constraints should be
looked up there. It seems they cannot be determined statically, because
both mapping and applicability seems to depend on the context, therefore
the properties should probably allowed everywhere.

 position applies to all elements, but not to generated content
 (What is generated content?)

Hmm, again, it's a shorthand for setting absolute-position and
relative-position. The interesting point is that according to these
properties, the settings static and relative make only sense for all
block-level and inline-level FO, while the other two settings,
absolute and fixed would only apply to block-container. It seems to
follow that position applies to all block-level and inline-level FO
elements. Again, looks like a spec bug.

Confused? Me too!

J.Pietschmann


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Latest FOP schema

2002-05-09 Thread Chuck Paussa

I've greatly improved the FO schema I've been working on.  I've added 
patterns for most of the attribute types. I'd appreciate it if some 
folks would run their FO documents through a validator against this 
schema and respond with where I've done a less than excellent job.

The schema as delivered is for the full FO spec. I've created groups 
and attributeGroups that segregate those elements and attributes 
implemented and not-implemented in FOP. To make this an FOP only schema, 
remove those groups and attributeGroups with _Not in their names.

A couple of items I've noticed:

FOP allows some elements, like fo:table-cell and fo:flow to be empty 
when the spec says they must be non-empty
FOP allows the contents of some elements like simple-page-master to be 
in any order when the spec insists on a defined order

Chuck Paussa




fop4e.zip
Description: Zip compressed data

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]