Line breaks and other typographical stuff (was: Re: Latest FOP schema)
Self-followup: Peter B. West wrote: These cover such categories as Case, Numeric Value, Dashes, Line Breaking and Spaces. I found them online, the relevant URLs appear to be http://www.unicode.org/Public/UNIDATA/LineBreak.txt http://www.unicode.org/Public/UNIDATA/extracted/DerivedLineBreak.txt and for the interpretation of the codes http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt (the lb section) I still think this area is somewhat unintuitive to browse. Does somebody know where there is a more elaborate explanation of the values used there, in particular whether there is a formal description how they are supposed to influence the actual line breaking? I don't want to rely on intuition here, it fails me much to often... Slightly related question: FOP appears to render the U+00A0 non breaking space always at full space width. Shouldn't the space also be used for justification purposes? There are, after all, non breaking spaces with a definite width available. Ooops, major blunder. I should check before posting. While there is a variety of spaces at U+2000 and following code points, as well as various additional spaces for some scripts, there is only the common U+00A0 non-breaking space, U+2007 figure space (whatever this is) and U+202F narrow non-breaking space available. This begs the question: how should arbitrary non-breaking spaces be expressed in XSLFO, and how often does this issue arise? I vaguely remember that the most often arising use case in common engliish was the space after an abbreviated title, and this is only available for space justification at the same level of fine tuning as character spacing (and it should be a slightly less wide than a full width space). Well, if we are at this, another typographical nastyness which comes to mind is an indented initial. This bothers me for quite some time now: How should this be expressed in XSLFO? In HTML, a floating table around the letter can be used, but this seems awkward and does not account for fine tuning like the outdent to account for serifs. Also, the automatic displacement of the next lines could be a problem. I think there is also a float necessary in XSLFO, perhaps with some adjustments to the width and with relative positioning for fine tuning. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Line breaks and other typographical stuff (was: Re: Latest FOP schema)
-Original Message- From: Joerg Pietschmann [mailto:[EMAIL PROTECTED]] Sent: May 14, 2002 7:52 AM To: FOP Dev Subject: Line breaks and other typographical stuff (was: Re: Latest FOP schema) I found them online, the relevant URLs appear to be http://www.unicode.org/Public/UNIDATA/LineBreak.txt http://www.unicode.org/Public/UNIDATA/extracted/DerivedLineBreak.txt and for the interpretation of the codes http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt (the lb section) I still think this area is somewhat unintuitive to browse. Does somebody know where there is a more elaborate explanation of the values used there, in particular whether there is a formal description how they are supposed to influence the actual line breaking? I don't want to rely on intuition here, it fails me much to often... This would not be covered in UTR 14, Line Breaking Properties? (http://www.unicode.org/unicode/reports/tr14/). Arved - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Line breaks and other typographical stuff (was: Re: Latest FOP schema)
-Original Message- From: Joerg Pietschmann [mailto:[EMAIL PROTECTED]] Sent: May 14, 2002 7:52 AM To: FOP Dev Subject: Line breaks and other typographical stuff (was: Re: Latest FOP schema) Well, if we are at this, another typographical nastyness which comes to mind is an indented initial. This bothers me for quite some time now: How should this be expressed in XSLFO? In HTML, a floating table around the letter can be used, but this seems awkward and does not account for fine tuning like the outdent to account for serifs. Also, the automatic displacement of the next lines could be a problem. I think there is also a float necessary in XSLFO, perhaps with some adjustments to the width and with relative positioning for fine tuning. text-indent. If that's not it, what do you mean by indented initial? Regards, Arved - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line breaks and other typographical stuff (was: Re: Latest FOP schema)
J.Pietschmann wrote: Patrick Andries wrote: This begs the question: how should arbitrary non-breaking spaces be expressed in XSLFO, and how often does this issue arise? Well, in fine French typography, this occurs often. Semicolon, question marks and exclanation marks, for instance, should be preceded by a fine non-breaking space while colon and closing guillemet ( ») should be preceded by a larger non-breaking space. I believe Unicode does not distinguish between these two cases, its customary answer would be that this is a higher protocol's duty : Unicode only marks a semantic function (non-breaking space) not its appearance. In other words, it's FO's problem ? This is my understanding. There is already a certain proliferation of spaces, and the Unicoders quite explicitely stated they feel mainly responsible for stuff resulting in glyphs Well, they feel responsible for characters. Glyphs is not Unicode's realm, they will off course turn out as such (most of them in any case: some are invisible, such as language tags). and want to support control characters, separators and spaces only to the extent necessary for compatibility and to deal with the cases which arise most often. I think they would like if there were a fo:word :-) Not sure, we actually discussed this topic on the Unicode internal list (sentence and word boundaries) . Word is a very language-specific thing: even French and English don't use the same typographical conventions to delimit them. Never mind languages like Thai where spaces do not separate words. Also I do not know off hand of any Unicode rendering algorithm that needs to know what a word is (sorting does), determing what is a paragraph (fo:block), however, is essential for bidirectional rendering. Well, arbitrary spaces can be achived in XSLFO by using space-start on a fo:inline. Attaching a keep-with-previous=always should make it non-breaking in my understanding, for example fo:blockMr. fo:inline space-start=0.4en keep-with-previous=alwaysBean/fo:inlinefo:block Seems to be an awful lot to write for a sort of fine tuning effect (most readers wouldn't appreciate it, or even take notice). This insertion should be done at another level (in French the non-breaking width variation is pretty deterministic, an XSLT stylesheet could do it I suppose). Well, current FOP ignores both properties on fo:inline anyway. (Already mentioned on an different list : I would really love to have an idea how things are progressing. It would be good for this to be posted regularly in a succinct fashion on the Web site. http://xml.apache.org/fop/design/status.html is very laconic. Difficult to plan any action on this base.) Ah, again: fo:blockMozifo:inline space-start=0.4enlla/fo:inlinefo:block hopefully never breaks in the middle unless permitted by hyphenation, it's just a word with a gap. Is this correct? Intuitively, yes. P. Andries Tout Unicode en français --- http://hapax.iquebec.com -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Line breaks and other typographical stuff (was: Re: Latest FOP schema)
A drop cap, in other words. :-) -Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED]] Sent: May 14, 2002 4:47 PM To: [EMAIL PROTECTED] Subject: Re: Line breaks and other typographical stuff (was: Re: Latest FOP schema) Arved Sandstrom wrote: text-indent. If that's not it, what do you mean by indented initial? I meant what the following HTML snippet shows: table width=300 tr td table align=left cellpadding=0px cellspacing=0px border=0px trtdspan style=font-size: 200%; L/span/td/tr /table porem ipsum dolor sit amet. Consectetuer adipiscing elit, sed diam nonummy. Nibh euismod tincidunt ut laoreet dolore magna. Aliquam erat volutpat. Iriure dolor in/p /td /tr /table J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
AW: AW: Latest FOP schema
J. Pietschmann wrote: fo:block are Rectangular areas, perhaps indented and with border, padding and other individual traits, nested into a rectangular area. I understand setting traits, properties. How about page layout, setting inline and baseline postitions? Does it imply a unconditional CRLF? What does the input below look look like on the page? fo:block level_0_text fills to position A fo:block level_1_text positioned at A fills to position B /fo:block more level_0_text positioned at B /fo:block Hansuli Anderegg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: AW: Latest FOP schema
Comments intermingled. -Original Message- From: J.U. Anderegg [mailto:[EMAIL PROTECTED]] Sent: May 13, 2002 5:15 AM To: [EMAIL PROTECTED] Subject: AW: AW: Latest FOP schema J. Pietschmann wrote: fo:block are Rectangular areas, perhaps indented and with border, padding and other individual traits, nested into a rectangular area. I understand setting traits, properties. How about page layout, setting inline and baseline postitions? Does it imply a unconditional CRLF? It's not that there is a CRLF, or anything like it, after a block, but rather that if it is succeeded by block-level siblings that they will be stacked in the block-progression-direction, so the effect will be the same. Can you be more specific with respect to the other questions? What does the input below look look like on the page? fo:block level_0_text fills to position A fo:block level_1_text positioned at A fills to position B /fo:block more level_0_text positioned at B /fo:block I think the predominant opinion is (assume all of this fits on one page) - a normal block area (generated by the outer block) that contains: one or more line areas for level_0_text fills to position A; then a block area with one or more line areas for level_1_text positioned at A fills to position B; finally more line areas for more level_0_text positioned at B. Note that if your example had been fo:block level_0_text fills to position Afo:block level_1_text positioned at A fills to position B /fo:blockmore level_0_text positioned at B /fo:block then it would still be the same. Regards, AHS - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Latest FOP schema
Arved Sandstrom Arved_37@ wrote: I think the predominant opinion is (assume all of this fits on one page) - a normal block area (generated by the outer block) that contains: one or more line areas for level_0_text fills to position A; then a block area with one or more line areas for level_1_text positioned at A fills to position B; finally more line areas for more level_0_text positioned at B. Note that if your example had been fo:block level_0_text fills to position Afo:block level_1_text positioned at A fills to position B /fo:blockmore level_0_text positioned at B /fo:block then it would still be the same. As a side note, assuming western language and script and hyphenation off, if the example had been fo:block level_0_text fills to position Afo:blocklevel_1_text positioned at A fills to position B /fo:blockmore level_0_text positioned at B /fo:block it is probably illegal, according to 4.7.2, Point 3. I suppose it would be illegal to have a line break within the word Alevel_1_text here. The problem here is, where do I get the rules whether a line break is permitted somewhere for a certain language and script? And how is this supposed to deal with out of context stuff like product numbers or artificial DB keys or programming language identifiers containing underlines and dashes, and with non-breaking spaces, odd symbols, and character abuse (uppercase greek omega instead of Ohm sign)? Again, I suppose the burden has to be put on the user who has to ensure everything is correct, including changing the current language for quotes, nested if necessary, and specifying a language for product numbers and programming language ids. Umm, something looking like ..., ISBN fo:inline language=x-isbn0-201-48345-9/fo:inline... and the fo:inline language=x-Javaorg.apache.fop.render.pdf.Fontfo:inline class implements the fo:inline language=x-Javaorg.apache.fop.layout.FontMetricfo:inline interface ... This would eleminate some keep-together stuff, I guess, but most probably requires a mechanism to teach the processor line breaking rules for user defined languages. DumbQuestions - Is the interpretation reasonable? (I don't ask about correctness...:-) - Can the redesigned FOP deal with the Alevel_1_text above, I mean will it raise an error or warning? - Can/should FOP deal with user supplied word/line breaking rules? /DumbQuestions Note that the same applies to the recently heavily discussed problem of a block level element inside an fo:inline, according to 4.7.3, in particular point 3. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
AW: Latest FOP schema
From the external view block means a rectangle containing formatted text, something like a paragraph. o What do fo:blocks as children of fo:blocks: mean for the end user? o What's teheffect of block's in combination with tag element TEXT like leader, marker, inline, wrapper, basic-link? o When is a block required? Hansuli Anderegg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: AW: Latest FOP schema
J.U. Anderegg wrote: From the external view block means a rectangle containing formatted text, something like a paragraph. o What do fo:blocks as children of fo:blocks: mean for the end user? Rectangular areas, perhaps indented and with border, padding and other individual traits, nested into a rectangular area. A user might be tempted to see them as higher level structures, like HTML DIV elements, or (nested) sections or whatever. That's not too bad but can be very misleading at times (for example, a headline probably has to be *mapped* to a fo:block too). Nested fo:blocks can be used by the transformation designer for pure technical reasons, for example to define certain properties for a longer stretch of text, without any corespondence to the structure of the original document. From this point of view, it has bee a very good idea to name a fo:block a block and not a paragraph. In the same sense, fo:table should probably have been named grid. BTW: the list related FOs are redundant, aren't they? Or am I missing something that can't be easily mapped to a table (grid)? o What's teheffect of block's in combination with tag element TEXT like leader, marker, inline, wrapper, basic-link? There are some hassles with whitespaces. There is some similarity in handling fo:leaders to handling whitespaces. o When is a block required? If you want to put text where a block level FO is expected. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Latest FOP schema
Kieron wrote: This looks very good. I think we should put this somewhere on the site when it is ready. That would be excellent Not quite done yet. I missed the attributes in the spec that are only listed as applying to elements but are not referred to in the element description. I'm confused bt the description of some of these attributes. background-position applies to block-level and replaced elements (What are replaced elements?) max-widthmin-heightmin-width apply to all elements except non-replaced inline elements and table elements (What are non-replaced inline elements?) position applies to all elements, but not to generated content (What is generated content?) The following attributes are listed in the spec as applying to all elements. (This makes sense) background border-bottom border-color border-left border-right border-style border-top border-width margin page-break-after page-break-before apply to block-level elements, list-item, and table-row (This makes sense) Chuck Paussa - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Latest FOP schema
Chuck Paussa wrote: background-position applies to block-level and replaced elements (What are replaced elements?) This seems to be an odd artefact of not having checked everything. The background-position property is a shorthand for combinations of the background-position-(horizontal|vertical) stuff, which in turn apply, as noted there to everything to which background applies, which is, ugh, every element (7.29.1). This all sucks, you should report a spec bug to the editors (well, read the amendments, it could already be there...) I believe replaced meant floating elements and footnotes originally, but the point is moot anyway. max-widthmin-heightmin-width apply to all elements except non-replaced inline elements and table elements (What are non-replaced inline elements?) These properties are mapped to i-p-d or b-p-d, the constraints should be looked up there. It seems they cannot be determined statically, because both mapping and applicability seems to depend on the context, therefore the properties should probably allowed everywhere. position applies to all elements, but not to generated content (What is generated content?) Hmm, again, it's a shorthand for setting absolute-position and relative-position. The interesting point is that according to these properties, the settings static and relative make only sense for all block-level and inline-level FO, while the other two settings, absolute and fixed would only apply to block-container. It seems to follow that position applies to all block-level and inline-level FO elements. Again, looks like a spec bug. Confused? Me too! J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Latest FOP schema
I've greatly improved the FO schema I've been working on. I've added patterns for most of the attribute types. I'd appreciate it if some folks would run their FO documents through a validator against this schema and respond with where I've done a less than excellent job. The schema as delivered is for the full FO spec. I've created groups and attributeGroups that segregate those elements and attributes implemented and not-implemented in FOP. To make this an FOP only schema, remove those groups and attributeGroups with _Not in their names. A couple of items I've noticed: FOP allows some elements, like fo:table-cell and fo:flow to be empty when the spec says they must be non-empty FOP allows the contents of some elements like simple-page-master to be in any order when the spec insists on a defined order Chuck Paussa fop4e.zip Description: Zip compressed data - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]