RE: [PROPOSAL] linebreak
At 2:58 pm + 26/2/02, ewitness - Ben Fowler wrote: [ snip ] In FO, you could write fo:block space-before=3pt fo:block space-before=0some line/fo:block fo:block space-before=0next line/fo:block /fo:block OK I have tried the FO in the attached file, giving the expected PDF result. Even if I haven't the facility that I want, There is a good, and fairly logical work-around. Ben. line-break.fo Description: Binary data line-break.pdf Description: Adobe PDF document - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: [PROPOSAL] linebreak
[EMAIL PROTECTED] (ewitness - Ben Fowler) wrote: [snip] I don't mind admitting that as an outsider to the XML standard, this looks like a bad, even a really bad, idea. My reading of your commentary is Whitespace is sometimes respected, and only a langauge lawyer can tell you when. Well, in some sense you are right, there are a lot of really bad ideas hidden in this area. However, you have to see this in context. I most certainly am looking at it in context. I was trying to do something simple and intuitive and it turned out gnarly and difficult. XML is meant to build on other things such as SGML, DSSSL and HTML by avoiding their mistakes. A *real* typesetter doesn't care about whitespace and line feeds, he thinks in paragraphs and columns and pages of flowing text, with various indentations and margins and such. Exactly so, and he thinks of leading and line height, and he thinks of paragraphs with 'space before' and 'space after'. I am prepared to argue that FO is a 'real' typesetter here, and should 'think' the same way. TeX was practically written to support this view, and this is the default how FO processors work. Quoting from the XML-dev list, a gentleman wanted to play space cadets and we got unix, another gentleman wanted to distribute his phone list and we got the WW web. Pretty much every worthwhile advance in the computer field has come from one person with a problem to solve. TeX came about because Professor Knuth URL: http://www-cs-faculty.stanford.edu/~knuth/ knew that computers could aid typesetting: it was written with one practical aim, rather than supporting a view. I don't see how you can argue that because TeX has \newline and \par it follows that FO should not have a semantic br / or forced line-break. The problem: not everybody is a typesetter, many people don't even know about how to set indents and hanging indents and margins and this stuff, but they have a space and an enter key sitting squarely on their keyboard. I may have misread you, but I think that you have intertwined two, possibly three things. 1. Not everybody is a typesetter ... Exactly, this is why there is a division of skill or labour. Authors write and typesetters mark up and set text. This is TeX 101, exempli gratia URL: http://www.ideography.co.uk/library/seybold/WYS_intro.html , and URL: http://www.ecn.wfu.edu/~cottrell/wp.html The author of a text should, at least in the first instance, concentrate entirely on the first of these sets of tasks. That is the author's business. Adam Smith famously pointed out the great benefits that flow from the division of labor. Composition and logical structuring of text is the author's specific contribution to the production of a printed text. Typesetting is the typesetter's business. This division of labour was of course fulfilled in the traditional production of books and articles in the pre-computer age. The author wrote, and indicated to the publisher the logical structure of the text by means of various annotations. The typesetter translated the author's text into a printed document, implementing the author's logical design in a concrete typographical design. One only has to imagine, say, Jane Austen wondering in what font to put the chapter headings of Pride and Prejudice to see how ridiculous the notion is. Jane Austen was a great writer; she was not a typesetter. You may be thinking this is beside the point. Jane Austen's writing was publishable; professional typesetters were interested in laying it out and printing it. You and I are not so lucky; if we want a printed article we will have to do it ourselves (and besides, we want it done much faster than via traditional typesetting). Well, yes and no. We will in a sense have to do it ourselves (on our own computers), but we have a lot of help at our disposal. In particular we have a professional-quality typesetting program available. This program (or set of programs) will in effect do for us, for free and in a few seconds or fractions of a second, the job that traditional typesetters did for Shakespeare, Jane Austen, Sir Walter Scott and all the rest. We just have to supply the program with a suitably marked-up text, as the traditional author did. I am suggesting, therefore, that should be two distinct ``moments'' in the production of a printed text using a computer. First one types one's text and gets its logical structuration right, indicating this structuration in the text via simple annotations. This is accomplished using a text editor, a piece of software not to be confused with a word processor. (I will explain this distinction more fully below.) Then one ``hands
RE: [PROPOSAL] linebreak
[EMAIL PROTECTED] (ewitness - Ben Fowler) wrote: [snip] Well, this is drifting off topic for this list... but see the very end of this message. And some remarks anyway: In the example The correct way to express procedure foo(); begin ... would be something like: fo:block fo:blockfoo();/fo:block fo:block margin-left=1em fo:blockbegin/fo:block ... I meant correct way to express the presentational aspects with XSLFO. There was no intention to feed this to a Pascal compiler. The use case was I have some Foo source code and want to include it in my printed manual If you want to have your specific (XML) data presented on a 2D area like paper sheets or a computer monitor screen, you probably have to 1. Assign some presentational semantic to your specific data elements like para or proc or record or author 2. Apply some commonly used concepts like kerning, space justification, word wrapping and such stuff XSL, both T and FO, attempts to make this possible, and XSLFO is the second part: a vocabulary for describing the presentation of stuff on a 2D area, perhaps splitted into a page stream (disregard audible properties, whose inclusion is just plain silly). Depending on your point of view, you can see either of the two steps or both together as the equivalent of typesetting. Which is why I say [RETURN] for end of paragraph - /p, and [SHIFT][RETURN] for end of line - br /; to make the easy way the right way. XSLFO does not assign semantics to FOs beyond what's necessary to get them layed out. It does not have a concept of paragraph, and the concept of line is not necessarily the same as what for example software manual writers or java compilers use. Note that there is no fo:line and no fo:para, just a fo:block, which is *not* a paragraph. Further note that HTML p has paragraph semantics, this means some space before and after by default. Also, in early HTML there was no possibility at all to restrict the, well, lets call it page width. Therefore you could not simply write psome line pnext line in order to get a managable line length, it would result in a line spacing making it unreadable. In FO, you could write fo:block space-before=3pt fo:block space-before=0some line/fo:block fo:block space-before=0next line/fo:block /fo:block if you want to have your content formatted this way. I can't see a need for a br equivalent in FO. Another note: in TeX, semantic markup and presentational aspects are mixed in a sometimes annoying way. LaTeX tried to go as pure semantic as resonable, but, unfortunately, you have sometimes a semantic too special to explicitely define an abstraction for it and therefore describe it by its presentation instead. FO, on the contrary, is as pure presentation as possible, taking only really widely used concepts into account and leave the rest to the first step mentioned at the beginning (no formatting of mathematical formulas in FO, no theorem numbering ...) Conclusion: If you want to write documents, use DocBook, not XSLFO. DocBook btw contains linebreak elements, probably for some reasons already mentioned, and apparently there are no difficulties to map them to FOs. In order to clean up the seemingly contradiction that FO also allows for interpretation of LF characters: If you have already properly marked up text for lines, you can transform it (probably easily) into FO blocks. If you pull in whitespace formatted data from a file or DB or something, you might want to have the FO processor respect the existing formatting rather than to analyze and properly transform the whole stuff. That's a quick hack to fill a gap. I already experienced some times that the result is not as good as it should be and someone still has to wade through the data and convert it to properly marked up (or at least properly structured) data (usually leading to hot debates about what *is* the structure behind the formatting?) Having said all that, FO still lacks stuff, but mostly related to the fact that pagination is a task of the FO processor and not known at the time FOs are generated, like: - elements to express conditional stuff on page breaks in other elements, like continued on next page or continued from previous page - conditions like if this element does not fit the current page, start it on a new page (not decisively solved by keep-together) or if NN percent of this element do not fit onto the current page, start a new page (some generalization of widows/orphans) Can one of the FOP developers comment on how easy/hard such stuff would be to implement as extension elements? Regards J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: [PROPOSAL] linebreak
[EMAIL PROTECTED] (ewitness - Ben Fowler) wrote: [snip] [snip ] I meant correct way to express the presentational aspects with XSLFO. There was no intention to feed this to a Pascal compiler. The use case was I have some Foo source code and want to include it in my printed manual O.K. I was a bit harsh there, but I was trying to make the point that some blocks in a ideal presentational system, such as comments would flow, but others such as code statements and expressions would not. If you want to have your specific (XML) data presented on a 2D area like paper sheets or a computer monitor screen, you probably have to 1. Assign some presentational semantic to your specific data elements like para or proc or record or author 2. Apply some commonly used concepts like kerning, space justification, word wrapping and such stuff XSL, both T and FO, attempts to make this possible, and XSLFO is the second part: a vocabulary for describing the presentation of stuff on a 2D area, perhaps splitted into a page stream (disregard audible properties, whose inclusion is just plain silly). (divided, or divided up rather than splitted) My gripe is that your (2) above should include kerning, ligatures, justification, bidi, word wrapping, hyphenation, forced line break, rubber space, widow/orphan control, keeps, insert space and such stuff. Depending on your point of view, you can see either of the two steps or both together as the equivalent of typesetting. Which is why I say [RETURN] for end of paragraph - /p, and [SHIFT][RETURN] for end of line - br /; to make the easy way the right way. XSLFO does not assign semantics to FOs beyond what's necessary to get them layed out. It does not have a concept of paragraph, and the concept of line is not necessarily the same as what for example software manual writers or java compilers use. Note that there is no fo:line and no fo:para, just a fo:block, which is *not* a paragraph. (laid out) And from my POV, that is a pity. I do think of a block as a paragraph, and my enter key or ETAG ending a paragraph and inserting the specified vertical space. 'lines' are largely the result of layout engine working with line length against word wrap, hyphenation and forced line break. I don't dispute the practicality of what you describe, I merely question the benfits of it. Further note that HTML p has paragraph semantics, this means some space before and after by default. Also, in early HTML there was no possibility at all to restrict the, well, lets call it page width. Therefore you could not simply write psome line pnext line in order to get a managable line length, it would result in a line spacing making it unreadable. In FO, you could write fo:block space-before=3pt fo:block space-before=0some line/fo:block fo:block space-before=0next line/fo:block /fo:block if you want to have your content formatted this way. I can't see a need for a br equivalent in FO. 'Page width' is set by the user, he or she can set the width of the browser to what suits that person, it is little to do with HTML. If I can use HTML as an example, (which is not in general a good idea), then this fragment H1MacHTTPBR An early web server/H1 really needs the BR, as H1MacHTTP/H1 H1An early web server/H1 is not the same in structure or presentation, and there is no other way (outside of CSS) of getting the required presentation. Obviously FO does not have this problem. Your example is interesting might work for me. The outer blocks represent paragraphs and the inner ones (typically only one) lines. I will try this before posting again. Another note: in TeX, semantic markup and presentational aspects are mixed in a sometimes annoying way. True Conclusion: If you want to write documents, use DocBook, not XSLFO. DocBook btw contains linebreak elements, probably for some reasons already mentioned, and apparently there are no difficulties to map them to FOs. Again I will check that. I was working from a starting point of generating/editing FO by hand. Certinly, I would recommend DocBook for serious authoring. FO is so like WordPerfect codes, that it seems a shame to make it gratuitously non-editable In order to clean up the seemingly contradiction that FO also allows for interpretation of LF characters: If you have already properly marked up text for lines, you can transform it (probably easily) into FO blocks. If you pull in whitespace formatted data from a file or DB or something, you might want to have the FO processor respect the existing formatting rather than to analyze and properly transform the whole stuff. That's a quick hack to fill a gap. I already experienced some times that the result is not as good as it should be and someone still has to wade through the data and convert it to properly marked up (or at least properly structured) data (usually leading to hot debates about what *is* the structure behind the formatting?) Which is also (I
RE: [PROPOSAL] linebreak
I guess the reason nobody thought fo:br/ or fo:newline/ would be required is because a U+000A will do the trick. [ snip ] In any case, a linefeed (LF) must be honoured, and result in a linebreak. _If_ the conditions are right. What that means is, the initial value for linefeed-treatment is treat-as-space, which _does_ do a conversion of U+000A to U+0020 (space). So you would want to specify linefeed-treatment='preserve' on an ancestor flow object (possibly fo:root) and allow it to propagate to the FOs of interest, as it is inheritable. The whitespace-* properties do not affect the linefeed, and suppress-at-line-break can also be left as it is. But essentially the LF is there to accomplish what you want to do. The initial setting of linefeed-treatment acts to give us LaTeX-like behaviour, but unlike LaTeX we can switch to something different in this regard, rather than use new markup. The answer that you gave is also to be found a few lines down from the first URL I gave you 4. Forced line-breaks are respected. Specifically, if A is the glyph-area generated by a fo:character whose Unicode character is U+000A, then A must be the last area in its containing subset Si. I don't mind admitting that as an outsider to the XML standard, this looks like a bad, even a really bad, idea. My reading of your commentary is Whitespace is sometimes respected, and only a langauge lawyer can tell you when. How should this be interpreted? Do you think that HTML would be improved if the BR element was replaced with a feature that said You can get the effect of a forced linebreak by setting 'linefeed-treatment' to 'preserve' in the body of the page (or other container as required), which causes all unix line feeds to be rendered instead the br / element which is what was done? From my POV this has an inhibiting effect on all editors and pretty printing utilities, which must also respect exisiting white space (as XSL processors do) and never introduce line feeds, in case this setting was ever turned on. From my POV, a formatter should always ignore the formatting of the source, unless notified that it is preformatted as in the case of PRE and CDATA, exempli gratia URL: http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Autools/sperberg-mcqueen/sperberg.html , (1994) about half way down. Do you happen to know whether this was ever discussed (id est objections sought and answered) or whether this was one person's idea that was incorporated as is. I have a related 'issue' which is that the normalize-string( ) function in XSL does two things. It trims leading and trailling newlines and other whitespace, and it normalises internal white space. I have a need for an operation that does the former, but not the latter. (In fact I have an implementation which appears to be buggy and replaces 'Miss A Burgrave' with Miss ABurgrave', but handles 'Miss A Burgrave' correctly. In short, XML processors including ones that produce XML-FO files should pass through all whitespace, and processors such as fop which are also XML processors, but adjusted so that they do not produce XML, should (at least in general) normalise whitespace. Where the output file format respects whitespace then it should be supplied as fo:text or as some break (as my original suggestion) The present situation is that the latter type of processor may not normalise whitespace, because some newlines are significant. Incidently, you have not made (or reported) a case against my suggestion: unless it is harmful (or confusing) there is no real reason why both styles of indicating significant breaks could not be used, is there? Using FOP derived from version 0.14, I get this report when I tried the following .fo WARNING: property 'linefeed-treatment' ignored WARNING: property 'linefeed-treatment' ignored setting up fonts formatting FOs into areas [1] rendering areas to PDF (source) ?xml version=1.0 encoding=UTF-8? fo:root xmlns:fo=http://www.w3.org/1999/XSL/Format; text-align=justified font-size=12pt font-family=serif linefeed-treatment='preserve' fo:layout-master-set fo:simple-page-master margin-right=50pt margin-left=100pt margin-bottom=25pt margin-top=75pt master-name=all fo:region-body margin-bottom=50pt / fo:region-after extent=25pt / /fo:simple-page-master /fo:layout-master-set fo:page-sequence id= hyphenate=true master-name=all language=en fo:flow flow-name=xsl-region-body fo:block linefeed-treatment='preserve' Bilbo Baggins,
RE: [PROPOSAL] linebreak
Comments below. -Original Message- From: ewitness - Ben Fowler [mailto:[EMAIL PROTECTED]] Sent: February 25, 2002 9:41 AM To: [EMAIL PROTECTED] Subject: RE: [PROPOSAL] linebreak I guess the reason nobody thought fo:br/ or fo:newline/ would be required is because a U+000A will do the trick. [ snip ] In any case, a linefeed (LF) must be honoured, and result in a linebreak. _If_ the conditions are right. What that means is, the initial value for linefeed-treatment is treat-as-space, which _does_ do a conversion of U+000A to U+0020 (space). So you would want to specify linefeed-treatment='preserve' on an ancestor flow object (possibly fo:root) and allow it to propagate to the FOs of interest, as it is inheritable. The whitespace-* properties do not affect the linefeed, and suppress-at-line-break can also be left as it is. But essentially the LF is there to accomplish what you want to do. The initial setting of linefeed-treatment acts to give us LaTeX-like behaviour, but unlike LaTeX we can switch to something different in this regard, rather than use new markup. The answer that you gave is also to be found a few lines down from the first URL I gave you 4. Forced line-breaks are respected. Specifically, if A is the glyph-area generated by a fo:character whose Unicode character is U+000A, then A must be the last area in its containing subset Si. I don't mind admitting that as an outsider to the XML standard, this looks like a bad, even a really bad, idea. My reading of your commentary is Whitespace is sometimes respected, and only a langauge lawyer can tell you when. How should this be interpreted? Do you think that HTML would be improved if the BR element was replaced with a feature that said You can get the effect of a forced linebreak by setting 'linefeed-treatment' to 'preserve' in the body of the page (or other container as required), which causes all unix line feeds to be rendered instead the br / element which is what was done? From my POV this has an inhibiting effect on all editors and pretty printing utilities, which must also respect exisiting white space (as XSL processors do) and never introduce line feeds, in case this setting was ever turned on. From my POV, a formatter should always ignore the formatting of the source, unless notified that it is preformatted as in the case of PRE and CDATA, exempli gratia URL: http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Autools/sperberg-mcqueen/s perberg.html , (1994) about half way down. Do you happen to know whether this was ever discussed (id est objections sought and answered) or whether this was one person's idea that was incorporated as is. I have a related 'issue' which is that the normalize-string( ) function in XSL does two things. It trims leading and trailling newlines and other whitespace, and it normalises internal white space. I have a need for an operation that does the former, but not the latter. (In fact I have an implementation which appears to be buggy and replaces 'Miss A Burgrave' with Miss ABurgrave', but handles 'Miss A Burgrave' correctly. In short, XML processors including ones that produce XML-FO files should pass through all whitespace, and processors such as fop which are also XML processors, but adjusted so that they do not produce XML, should (at least in general) normalise whitespace. Where the output file format respects whitespace then it should be supplied as fo:text or as some break (as my original suggestion) The present situation is that the latter type of processor may not normalise whitespace, because some newlines are significant. Incidently, you have not made (or reported) a case against my suggestion: unless it is harmful (or confusing) there is no real reason why both styles of indicating significant breaks could not be used, is there? [ SNIP example ] line-feed treatment was reported as not working in June last year, URL: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1998 , I don't know whether tit is now in. I now have a linux installation (but not yet CVS), and so I am in a position to start some development work on FOP. Where should I start? Is there a list of outstanding tasks? I wrote that a few days ago, but delayed sending it until I could see what bugzilla could tell me. In the meantime, bugzilla has sent me an e-mail giving no fewer than 195 issues. My search on bugzilla reveals 18 high priority bugs. URL: http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEWbug_status=ASSI GNEDbug_status=REOPENEDpriority=Highemail1=emailtype1=substringemailass igned_to1=1email2=emailtype2=substringemailreporter2=1bugidtype=include bug_id=changedin=votes= Nonetheless, my query remains, is there a list of issues which people can start working on now, that won't need to be re-done once the redesign is on place. Ben My Comments: 1. Bear in mind that 'linefeed-treatment' need
RE: [PROPOSAL] linebreak
[EMAIL PROTECTED] (ewitness - Ben Fowler) wrote: [snip] I don't mind admitting that as an outsider to the XML standard, this looks like a bad, even a really bad, idea. My reading of your commentary is Whitespace is sometimes respected, and only a langauge lawyer can tell you when. Well, in some sense you are right, there are a lot of really bad ideas hidden in this area. However, you have to see this in context. A *real* typesetter doesn't care about whitespace and line feeds, he thinks in paragraphs and columns and pages of flowing text, with various indentations and margins and such. TeX was practically written to support this view, and this is the default how FO processors work. The problem: not everybody is a typesetter, many people don't even know about how to set indents and hanging indents and margins and this stuff, but they have a space and an enter key sitting squarely on their keyboard. The correct way to express procedure foo(); begin dostuff:=false; end would be something like: fo:block fo:blockfoo();/fo:block fo:block margin-left=1em fo:blockbegin/fo:block fo:block margin-left=2em fo:blockdostuff:=false;/fo:block /fo:block fo:blockend/fo:block /fo:block /fo:block but chances are you'll get it space- or even (shudder!) tab-indented. (Take a postal address block for another, less IT-related example) [If i'd get a chance to correct the past, i probably kill the inventor of the tab character before he commits his crime :-] There is a lot of whitespace formatted data out there, and it is unlikely to disappear in the near future. In order to deal with realities, you can fine-tune how FO processors handle various forms of white space. Actually, it is encouraged to do so only locally. You might have noted that in HTML+CSS br actually *is* redundant, it is just heavily (ab)used because it produces predictable results without fumbling with gnarly CSS settings. Especially if you have to bring already whitespace formatted data online *quickly*. Typewriter habits are hard to get rid of, regardless how enraged professionals are about this. Regards J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: [PROPOSAL] linebreak was Re: REDESIGN: where I have been hiding
-Original Message- From: ewitness - Ben Fowler [mailto:[EMAIL PROTECTED]] Sent: February 18, 2002 9:36 PM To: [EMAIL PROTECTED] Subject: RE: [PROPOSAL] linebreak was Re: REDESIGN: where I have been hiding This would be useful in writing addresses exempli gratia: ?xml version=1.0 encoding=UTF-8? fo:root text-align=justified font-size=12pt font-family=serif fo:block Bilbo Baggins,fo: newline / Bag End,fo: newline / Underhill,fo: newline / Hobbiton,fo: newline / Westfarthing of the Shire. /fo:block /fo:root At present, I can get the effect I want with tables. Ben. -end of Original Message- I guess the reason nobody thought fo:br/ or fo:newline/ would be required is because a U+000A will do the trick. Thank you. I had assumed that that character would count as white space, and would be normalised away. I will try it. Ben. - My answer was so terse that maybe it sounded snippy, which was not my intention. I also can't say that FOP is up to spec with whitespace handling. I'm thinking that it's not, but I'll have to check myself. So my comments are related to the spec only. In any case, a linefeed (LF) must be honoured, and result in a linebreak. _If_ the conditions are right. What that means is, the initial value for linefeed-treatment is treat-as-space, which _does_ do a conversion of U+000A to U+0020 (space). So you would want to specify linefeed-treatment='preserve' on an ancestor flow object (possibly fo:root) and allow it to propagate to the FOs of interest, as it is inheritable. The whitespace-* properties do not affect the linefeed, and suppress-at-line-break can also be left as it is. But essentially the LF is there to accomplish what you want to do. The initial setting of linefeed-treatment acts to give us LaTeX-like behaviour, but unlike LaTeX we can switch to something different in this regard, rather than use new markup. Regards, AHS - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: [PROPOSAL] linebreak was Re: REDESIGN: where I have beenhiding
This would be useful in writing addresses exempli gratia: ?xml version=1.0 encoding=UTF-8? fo:root text-align=justified font-size=12pt font-family=serif fo:block Bilbo Baggins,fo: newline / Bag End,fo: newline / Underhill,fo: newline / Hobbiton,fo: newline / Westfarthing of the Shire. /fo:block /fo:root At present, I can get the effect I want with tables. Ben. -end of Original Message- I guess the reason nobody thought fo:br/ or fo:newline/ would be required is because a U+000A will do the trick. Thank you. I had assumed that that character would count as white space, and would be normalised away. I will try it. Ben. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]