Foray's font subsystem for Fop
Hi Fop Team and Victor, I'm considering to adapt Foray's font subsystem to Fop. I have already experimented a bit and the thing seems to be rather feasible. So far I have encountered two problems: - logging mechanism: Foray uses the avalon framework while Fop uses commons logging. The 2 APIs are similar but I suppose I'll have to convert the avalon stuff into commons. Or are there any plans to change the logging mechanism (I'm thinking about the FOPAvalonization Wiki page)? Another minor problem will be to plug the right logger to the font subsystem. I guess only one logger is created and passed through all classes? - the font subsystem is based on a client/server architecture; the question is: which Fop class should be made a FontConsumer? And where should the FontServer be created and held? So far I've used FOEventHandler as a FontConsumer and a holder of a FontServer. It's quite convenient but I'm not sure at all that it is good design; I'm not yet used to Fop's overall architecture. I welcome any additional thoughts/comments. Now starting to work... Regards, Vincent
Re: AW: MathML and barcode support for FOP
While we are speaking of that, If I may give my opinion: I agree with Norman that using images to render maths isn't a good solution in the long-term. The fact that it is SVG improves the situation a bit because fonts will be rendered fine, but there are other problems to address: for example it is difficult to align the baseline of an inline-rendered equation with the text's baseline. It also is not possible to break an equation into multiple lines. A native MathML renderer will be necessary to beat TeX in this area. I was thinking of writing one for Fop but I'm missing time and for now I'm in the font subsystem stuff. The work referred by Siarhei Baidun may definitely be interesting. Vincent Jeremias Maerki a écrit : The MathML extension uses JEuclid to convert the MathML to SVG internally so we get quite good quality. I don't think it is possible to create XSL-FO code from MathML because you can't properly place all the elements. Doing that with SVG is a lot better. On 27.07.2005 10:54:45 Norman Markgraf wrote: Sorry to interrupt you all. But I have so concerns using JEuclid for MathML. I'm not sure if I have the permission to post here, but maybe you will excuse my post if so. I am not sure if using JEuclid is the right way to deal with MathML. As far as I understand JEuclid transforms a MathML expression into an image. If this is correct, than I would found this the wrong way in principle. Wouldn't it be nicer if the MathML expression is converted into XSL:FO it self? I am not very in this field, but as far as I understand MathML (pm) this should be the way to go. Or do I completely misinterpret something? Jeremias Maerki
offline
Hi all, I'll be offline during 3 weeks: summer holidays, far from computers ;-) My work on the font subsystem is getting along, a bit slowly those last days but I hope to have more time after holidays. Currently I'm on the pdf part: it's a bit difficult because font and pdf things are very intermixed, and are handled each time a bit differently in Fop 0.20.5, Fop trunk and FOray. I don't know if the font subsystem will be ready for the next release, but I think that after the pdf part is done the rest should be easier to adapt. I'll do my best to make it ready. Just in case, I've put a patch on Bugzilla for those who might want to see my changes during these 3 next weeks. There are conflict issues, problems with the new layoutmgr.inline subpackage and the code won't compile. So this is really for curious guys that have time to loose ;-) See you in 3 weeks. Cheers, Vincent
Re: Relative font weights and font selection
Victor Mote a écrit : Manuel Mall wrote: Regarding the bolder, lighter issue and the general font selection I looked at the pre-patch for FOrayFont adaptation to Fop (http://issues.apache.org/bugzilla/show_bug.cgi?id=35948) and concluded that meddling with the font selection system will interfere with the FOray font integration and that the FOray font system has addressed most of the font selection issues any way (not sure about the bolder, lighter bits though). I will therefore back-off from that line of work and wait for the FOray font integration to complete, assuming that it is still going ahead. Sorry to be so slow responding. I think Vincent is taking August off, but is still working on the font integration work. I confirm. Still one week offline (I'm connected only tonight) and I get back on my work on font integration. Manuel and I have had an off-line conversation about the bolder/lighter issue, and I think we will need to improve both the interface and the implementation to handle this and the similar issues for font-stretch. I'll work on that in the next week or two. There was a TODO in the code where bolder and lighter should be handled. I've left it as is for now as it is not very important yet. I had the feeling that the new font mechanism would ease things but as you say there seems to be some work to do. We will have to discuss about that one day... Cheers, Vincent
Re: svn commit: r240012 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/render: pdf/PDFRenderer.java ps/PSRenderer.java
Jeremias, Just in case you intended to do any improvement there: the FOrayFont integration may bring some facilities in this area. At least the handling will be different, so I don't think it's worth working on this before the integration is done. So please leave it as is for now. Thanks! I've finished reading the huge amount of mails that have been written to this list during August, getting back to work now. Regards, Vincent [EMAIL PROTECTED] a écrit : Author: jeremias Date: Thu Aug 25 00:28:27 2005 New Revision: 240012 URL: http://svn.apache.org/viewcvs?rev=240012view=rev Log: Kerning is currently not supported by the layout engine, so disable it for PDF and add a TODO item for PS. Modified: xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFRenderer.java xmlgraphics/fop/trunk/src/java/org/apache/fop/render/ps/PSRenderer.java Modified: xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFRenderer.java URL: http://svn.apache.org/viewcvs/xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFRenderer.java?rev=240012r1=240011r2=240012view=diff == --- xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFRenderer.java (original) +++ xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFRenderer.java Thu Aug 25 00:28:27 2005 @@ -1187,7 +1187,9 @@ boolean kerningAvailable = false; Map kerning = fs.getKerning(); if (kerning != null !kerning.isEmpty()) { -kerningAvailable = true; +//kerningAvailable = true; +//TODO Reenable me when the layout engine supports kerning, too +log.warn(Kerning support is disabled until it is supported by the layout engine!); } int l = s.length(); Modified: xmlgraphics/fop/trunk/src/java/org/apache/fop/render/ps/PSRenderer.java URL: http://svn.apache.org/viewcvs/xmlgraphics/fop/trunk/src/java/org/apache/fop/render/ps/PSRenderer.java?rev=240012r1=240011r2=240012view=diff == --- xmlgraphics/fop/trunk/src/java/org/apache/fop/render/ps/PSRenderer.java (original) +++ xmlgraphics/fop/trunk/src/java/org/apache/fop/render/ps/PSRenderer.java Thu Aug 25 00:28:27 2005 @@ -25,6 +25,7 @@ import java.io.OutputStream; import java.util.Iterator; import java.util.List; +import java.util.Map; // FOP import org.apache.avalon.framework.configuration.Configuration; @@ -713,7 +714,16 @@ handleIOTrouble(ioe); } } -//paintText(rx, bl, , f); + +boolean kerningAvailable = false; +Map kerning = tf.getKerningInfo(); +if (kerning != null !kerning.isEmpty()) { +//kerningAvailable = true; +//TODO Fix me when kerning is supported by the layout engine +log.warn(Kerning info is available, but kerning is not yet implemented for ++ the PS renderer and not currently supported by the layout engine.); +} + String text = area.getTextArea(); beginTextObject(); writeln(1 0 0 -1 + gen.formatDouble(rx / 1000f) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Relative font weights and font selection
Victor Mote a écrit : I am ignoring font-stretch for now. I am unclear whether it works similarly to font-weight, or whether it is totally resolvable in the FO Tree. Interestingly, CSS 2.1 (the only version of CSS 2 still available at W3C) removes font-stretch entirely!!??!! As I understand the spec, this works differently from font-weight and can be resolved in the FO Tree: just select the next expanded value for wider or next condensed for narrower. The font selection would be performed only after, when it is time to decide e.g. which font the keyword semi-expanded matches. That's true that it is an extra-feature that IMO can be simulated with a good font configuration file. For font-weight, there seems to be some ambiguity in the standard(s). There are two possibilities, and neither CSS 2.1 nor XSL-FO seem to resolve the matter: 1. Apply bolder and lighter to the inherited font to compute a weight that is applied to the selected font. 2. Select the font, inheriting the weight from the inherited font, then applying bolder and lighter to that weight. I'd go with 1. Get the inherited font; find a darker one in the fonts database; get its weight value. That's it. In order to move forward, I suggest the addition of the following methods in org.axsl.font.Font: public byte nextBolderWeight(); public byte nextLighterWeight(); public org.axsl.font.Font nextBolderFont(); public org.axsl.font.Font nextLighterFont(); This will allow the client application (FOP) to use whichever algorithm it thinks is appropriate. The bad news is that this ties each registered font to exactly one font-family, something I was hoping to avoid. That seems OK. The only interest I see for a font to belong to several families is when there is a specific family (Times, Helvetica) and a generic one (serif, sans-serif...). In this case a generic family would be mapped to a specific one, and I don't think your proposed methods prevents that. Otherwise I don't see much interest to mix several families to build a complete set. The result would be visually bad IMO. I may have missed something: I haven't studied that point yet. There is another area complexity in font selection that has not yet been addressed, so I pose it here to Vincent and Manuel especially, and to any others who wish to comment. The whole issue of whether the Font has a glyph for the character(s) has not yet been addressed. The best idea I have for this is as follows: 1. Add a char to the signature of org.axsl.font.FontServer.selectFont. This char represents the first char of the text for which the font is being selected. This allows the selection process to pass by a font-family if it cannot paint the character. So let's assume that I have a line of text to render. IIUC I would use it like this: * first call with the first char of the text to get the font that will be generally used * an additional call for each character for which there is no glyph in the general font Is that what you mean? 2. Add the following method to org.axsl.font.Font: /** * Examines each character in string to ensure that a glyph exists in the font for that * character. If a character has no glyph in the font, the character's index in string * is returned. * @return The index in string of its first character for which no glyph exists in this * font. If all characters in the string have glyphs in this font, -1 is returned. */ public int unavailableChar(String string); Add also an overridden version of this method with char[] as the parameter. Why not directly return an array of all indexes where there is a missing glyph? Or add a beginIndex parameter so that one doesn't have to artificially recreate a String made of the initial String minus all characters up to the first missing glyph? Between these two, I think an application should be able to efficiently subdivide a chunk of text based on the various fonts that may need to be used to process it. In the long-term the font-selection-strategy will have to be implemented. The preceding stuff may need to be completed. Comments on any of this are very welcome. I had hoped to defer some of these font selection issues for a while yet, and you guys are frankly ahead of me in needing to resolve them, so I will be glad to react to those who may have thought it through more than I have. I wish I could be more helpful, but I haven't considered all aspects of the problem yet and I don't catch the whole point. I'd like to first finish the font integration work. IMHO this feature is for now not that important. What do other committers think? Vincent
Re: Relative font weights and font selection
Victor Mote a écrit : As I understand the spec, this works differently from font-weight and can be resolved in the FO Tree: just select the next expanded value for wider or next condensed for narrower. The font selection would be performed only after, when it is time to decide e.g. which font the keyword semi-expanded matches. That's true that it is an extra-feature that IMO can be simulated with a good font configuration file. Just to be clear, I understand your last sentence to be addressing a different topic than the first part of this statement. That is, font configuration won't be at all involved with the *resolution* of font-stretch in what you have proposed. However, it may be involved from the standpoint of implementing a resolved font-stretch value in that font-stretch could be simulated using PostScript or PDF text parameters. Did I understand this correctly? Yes, except the end: we agree that it would not be the purpose of the font config file to solve font-stretch. What I meant is that we could use a workaround by specifying different font families for expanded fonts; e.g. one family Times-Normal and one Times-Expanded, instead of one family Times with to font-stretch variants, Normal and Expanded. The user, instead of changing the font-stretch property, would change the font-family. For all of this, probably the best approach is for someone to do exactly what you have done above: suggest changes to the interface that will provide the information needed. I'll put the problems of font-stretchability and glyph substitution on my personal todo list. I may consider those problems later, when I'm (at last!) finished with the font integration work. Vincent
Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch
Chris, I'm afraid I don't agree with you here. You seem to mix up space conditionality and space precedence. See § 4.3 of the spec, and especially § 4.3.1, Space-resolution rules [1] Conditionality only stands for spaces that begin a reference area; as per the first rule, all of the conditional spaces that begin a reference area are discarded. Now for remaining spaces the precedence is to be considered. The space with the highest precedence wins. In your sample (I have added a space-after for clarity): fo:root xmlns:fo=http://www.w3.org/1999/XSL/Format; fo:layout-master-set fo:simple-page-master master-name=a4 page-width=210mm page-height=297mm margin=5mm fo:region-body margin-top=90mm margin-bottom=90mm background-color=blue/ /fo:simple-page-master /fo:layout-master-set fo:page-sequence master-reference=a4 fo:flow flow-name=xsl-region-body font-size=18pt fo:block space-before=5mm space-before.conditionality=retain space-after=10mmThere should be 5mm of space before this block/fo:block fo:block space-before=5mm space-before.conditionality=discardThere should be 10mm of space before this block /fo:block /fo:flow /fo:page-sequence /fo:root The second fo:block does not begin a reference area, so space conditionality isn't taken into consideration. For both spaces, precedence is not specified so the default value of 0 is used (§ 7.10.5 7.10.6). The third rule of § 4.3.1 states that between the two spaces of the same precedence, the one that has the highest (optimum) value wins; here the space-after of the first block. Imagine a scenario where you have many different documents generated by different stylesheets, which all share a common styleset, i.e. an imported XSL file containing xsl:attribute-sets. The styles defined there are used throughout the documents. Now in some documents you might have two adjacent paragraghs that both use styles with space-before=10pt and after=10pt. In the document it is desirable only to have 10pt between the paragraphs but how this be achieved? The paragraphs need to use the styles they are using for other reasons. This is where discard comes in handy. Add conditionality=discard to the two styles and then the space from one gets dropped. Here we disagree: if I understand the spec correctly this is precedence that should be used, e.g. like in the following: fo:block space-after=10mm space-after.precedence=100There should be 10mm of space after this block/fo:block fo:block space-before=10mm space-before.precedence=200There should be 10mm of space before this block/fo:block Here the space-before of the second block has a higher precedence than the space-after of the first one, and thus wins. The resolved space is 10mm. As per the testcase spaces-block2, I similarly think there should be a space between the first and second block on page 2; anyway, in this case the actual behaviour is probably wrong as the space resolution rules (if I understand them correctly) seems to imply that it should be only 10 points. Yes, that's right there should be less space - not no space. I think Lucas is right here. The resolved space should be 10pt, that is the optimum space of the two spaces of same value and same precedence. If .minimum and .maximum were specified, the resolved space would take the minimum of the two .minimum values and the max of the two .maximum values for its own .min and .max values. Do you agree? Vincent [1] http://www.w3.org/TR/xsl/slice4.html#area-space
Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch
Chris Bowditch a écrit : The second fo:block does not begin a reference area, so space conditionality isn't taken into consideration. For both spaces, precedence is not specified so the default value of 0 is used (§ 7.10.5 7.10.6). The third rule of § 4.3.1 states that between the two spaces of the same precedence, the one that has the highest (optimum) value wins; here the space-after of the first block. Well the current implementation doesn't work like that. Both spaces are included to give 20pt of space between the two paragraphs. Then either the implementation is broken (i.e. the spec was misunderstood), or this is a not yet implemented feature. No flame here, just for clarity. I'm not an expert in the details of the spec, but isnt the precendence ignored unless conditionality=discard? No, in fact both notions are orthogonal: conditionality only deals with space beginning a reference-area, precedence deals with priorities between several successive spaces. They both work independently. Yes I do agree, with the details you describe. But I wasn't trying to drill into detail, I was just saying it's not quite right yet. So my point still stands: there is some work still required to get this working 100%. That's fine. I replied because I thought this could help understanding the process. We agree that it is yet WIP. I wish I could do something in this area, as I find this functionality very powerful, but I'm currently concentrated on FOrayFont. Vincent
Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch
Manuel Mall a écrit : I would have thought its more of a nice to have but not a requirement for this release. Exactly. If FOrayFont is ready for this release, all the better. It's difficult to say if it will be. The pdf library is now converted, PDFRenderer almost. I think the PSRenderer and SVG will demand most work. I currently don't know the impact on Java2D and RTF renderers. And when the integration is done there will have to be much debugging and non-regression tests IMO... Vincent
Re: e-g with padding and borders
I'm not sure here. The fo:external-graphic uses the large-allocation-rectangle (§ 6.6.5), that comprises padding and border. This makes me say that in Manuel's example the fo:block's bpd should be calculated with the second formula. The fo:block's content forms a line whose line-stacking-strategy is max-height (default). Thus its allocation rectangle should comprise the image's border padding (§ 4.5). And so does the block. I may be wrong, as this part of the spec is still somewhat unclear to me. WDYT? Vincent Jeremias Maerki a écrit : Indeed, the normal allocation rectangle of an inline area is different than the one of a block area. See 4.3.2. Geometric Definitions in the 1.0 spec. Border and padding for an inline area seem to be outside the allocation rectangle in before and after directions. Interesting. On 01.09.2005 17:29:50 Manuel Mall wrote: I have a follow-up question on this. If we have something as simple(?) as this: fo:block background-color=orange fo:external-graphic src=../../resources/images/bgimg300dpi.jpg border=solid 5pt padding=5pt background-color=white/ /fo:block would you expect the whole image including padding and borders to be within the bounds of the enclosing block or only the actual image to be in the block and the padding and borders to stick out at the top and bottom. It seems xep takes the latter approach and I am very uncertain in this area. Or to put it differently is the BPD of the enclosing block bpd = image height + line-spacing or bpd = image-height + top_and_bottom_borders + top_and_bottom_padding + line-spacing ? Manuel snip/ Jeremias Maerki
Re: [Xmlgraphics-fop Wiki] Update of ExtensionPoints by JeremiasMaerki
Luca, I'm speaking here as a (future) Fop user. Just to let you know that I'm definitely wanting to support you in this area. I think your extensions would make Fop an extremely powerful typesetting system, that would eventually beat TeX in the quality of page makup. It's all the more interesting for me since my use of Fop would be to produce book-style documents. Just a comment about your Wiki page: I'm not sure that modifying margins would produce visually appealing results. May it not disturb the reader when she notices that margins aren't the same after turning a page? Otherwise I agree with all of your other propositions. I wish you good success, Vincent Luca Furini a écrit : Speaking of extensions, I'd like to resurrect the layout extensions that were part of the code used to start the Knuth branch, but I want to be sure I'm allowed to do it. The set of extensions (a couple of new properties, and some new value for an existing one) is aimed to give the user more control about the page breaking: in particular, via these extensions it is possible to give the application a list of properties that can be adjusted in order to fill all the available bpd of a region (in addition / substitution to the spaces between blocks [1]). I started writing a wiki page about these extensions on the wiki at http://wiki.apache.org/xmlgraphics-fop/LayoutExtensions (I really should take some time to finish it!). My highest-priority, short-term task is still to fix the behaviour of page-number and page-number-citation, as I think these formatting object must work in the next release: I have almost done, just have to finish handling the case of justified ext. After that, obviously if there are no objections against this, I'd like to spend some time on the extensions, that I'm sure could come in handy for fop-users producing book-style (or report-style) documents. For example, here is a link to a message in the xsl-editors mailing list requesting a feature which is completely equivalent to one of the layout extensions: http://lists.w3.org/Archives/Public/xsl-editors/2005JulSep/0007.html (many thanks to Jeremias for pointing it out to me!). Should I be allowed to keep working on this subject, I could answer him that fop will soon be able to cope with his request. Regards Luca [1] ... which makes me think that I should work on space resolution rules too ... my to-do list keeps growing longer and longer! :-(
Re: e-g with padding and borders
Jeremias Maerki a écrit : The real problem IMO is probably block-level content in fo:inlines again. How are these borders to be painted? A border around each inlineblockparent (one for each block inside the inline)? I'm not sure judging from the specification. Here the spec starts being really complicated. I would say you're right, thought not sure. See the last sentence of § 4.2.2: Unless otherwise specified, the traits of a formatting object are present on each of its generated areas, and with the same value. (However, see sections [4.7.2 Line-building] and [4.9.4 Border, Padding, and Background].). The referred sections don't seem to hold for the fo:inline case. What disturbs me is that when one specifies a border around a chunk of text and there is line-breaking, this border should appear and the end of the first line and the beginning of second line, as below: This is a | chunk of text | - __ | with border | blah blah --- blah blah What is more intuitive and could be expected by a user is the following: __ This is a | chunk of text --- _ with border | blah blah - blah blah but IIUC this is not allowed by the spec. I ask for confirmation here. So the example you provided with the 2 fo:blockblah blah/fo:block is rendered correctly in terms of borders (but there should be no space between them, probably part of the rendering problem you raised). Vincent
Re: e-g with padding and borders
What disturbs me is that when one specifies a border around a chunk of text and there is line-breaking, this border should appear and the end of the first line and the beginning of second line, as below: This is a | chunk of text | - __ | with border | blah blah --- blah blah What is more intuitive and could be expected by a user is the following: __ This is a | chunk of text --- _ with border | blah blah - blah blah but IIUC this is not allowed by the spec. I ask for confirmation here. I would agree that this is not allowed by the spec. The traits are the same for all areas. There don't seem to be any exceptions. Actually, I'm glad there aren't that would complicate things even more. :-) But maybe someone who thinks this would be an important feature could probably write an extension for that. :-) I've just checked: with CSS this is the second layout which is rendered. So there would be an incompatibility here between XSL-FO and CSS, which is astonishing as the spec claims several times to promote compatibility. Anyway, it's not an important feature for me :-] Vincent
Re: e-g with padding and borders
Hi Andreas, You're right. Indeed both situations below are handled by the standard, thanks to border conditionality and is-first/is-last traits. Thanks for the pointer! Vincent Andreas L Delmelle a écrit : On Sep 2, 2005, at 17:44, Vincent Hennebert wrote: Hi, snip / This is a | chunk of text | - __ | with border | blah blah --- blah blah What is more intuitive and could be expected by a user is the following: __ This is a | chunk of text --- _ with border | blah blah - blah blah snip / Hmm... I remember reading something about this --wait a minute... Yep! Got it. See Rec 4.3.1 Space resolution rules all the way down The border or padding at the start-edge or end-edge of an inline-area I may be specified as conditional. If so, then it is set to zero if its associated edge is a leading edge in a line-area, and the is-first trait of I is false, or if its associated edge is a trailing edge in a line-area, and the is-last trait of I is false. (see also: 7.7.9 border-before-width .. XSL modifications to the CSS Definition) By default, the first would be applicable. If the user explicitly specifies border-start-width.conditional=discard, the result would have to be the second. No extension needed. Cheers, Andreas
Re: Logging for FOrayFont
Hi Victor, What I liked with the Avalon Logger is the one-to-one correspondance between it and Commons' Log; commons just has one more level which is trace. So writing a Logger adapter that delegates logs to a Log instance is trivial. Now it's different because PseudoLogger has 7 log levels + 1 debug level, whereas commons Log has 6 levels with different purposes. The best mapping that I see is the following: PseudoLogger - Log finest trace finer trace finetrace debug debug config info infoinfo warning warn severe error Log's fatal level wouldn't be used. Writing an adapter in the other way would have been somewhat easier (and BTW corresponds to commons' Jdk14Logger). Personally I tend to find Commons log levels more intuitive and useful than the Jdk ones: I don't really know what to do with 3 fine, finer, finest and one config levels. May I suggest you to use Commons' style of levels instead? That said, this is by no means dramatic. For me it's just a matter of writing another wrapper. I agree that it's a bit cleaner if the font system has its own logging rules, independently of other existing logging systems. So no problem for me. Vincent Victor Mote a écrit : I just completed a project to make FOray's logging a bit more flexible. It now logs from an interface called org.axsl.common.PseudoLogger. Logging levels are the same as those for java.util.logging.Level (in Java 1.4 and higher), except that integrals are used instead of Level instances. I also wrote an implementation org.axsl.common.AvalonLogger, which FOray uses (for now) when it needs to *create* a logger. Since all loggers in the font system are supplied to the font system (instead of created within it), FOP should simply pass a different implementation to keep its logging consistent within itself. The AvalonLogger is a thin wrapper around an, er, Avalon ConsoleLogger, and is essentially an Adapter between the Avalon logging system and PseudoLogger. A similar approach can be used with whatever logging system FOP decides it wants to use. Writing the adapter should be fairly trivial, and it should be possible to use any logging system with this approach. I hope this makes the integration work a bit easier and the results more satisfactory to FOP. Please let me know if you have questions. Victor Mote
Re: Logging for FOrayFont
Victor Mote a écrit : Actually there is not a level named debug, although I might have defined that constant equal to finest in one of the earlier versions. This does not appear in CVS. I would suggest you to redefine such a constant to remove any ambiguity, as as you can see it confused me. Here is the way I mapped the Avalon levels in the AvalonLogger implementation: http://cvs.sourceforge.net/viewcvs.py/axsl/axsl/axsl-common/src/java/org/axs l/common/AvalonLogger.java?view=markup FINEST debug FINER info FINEinfo CONFIG info INFOinfo WARNING warn SEVERE error Why not. Is I know now that debug corresponds to finest I'll follow the same scheme for commons Log. I don't really feel strongly about it either, but perhaps a bit more strongly than you for the following reasons: 1. From a sheer standard aspect, I wanted to stay as close to the Java logging system as possible. I would have used the java.util.logging.Level instances (for type safety) instead of numeric constants, except for trying to retain Java 1.3 compatibility. 2. I prefer to allow for more granularity rather than less (within reason), even if we don't think we need it right now. 3. This is one of those things that you can change on Tuesday to make one party happy, then change back again on Wednesday to make another party happy, all for very little benefit. In short, there is no way to make everyone happy. I understand your concerns and agree with them. Also, I don't know if you noticed the following methods: info(String message) warn(String message) error(String message) debug(String message) which correspond directly to the Avalon methods of the same name, and are intended to provide a sort of mapping for them. Certainly, but I also have to map the logMessage method... I don't mind adding one more called trace(String message) if that would make the mapping concept more clear for you. Well, no need I think; as trace is below debug and debug is mapped to finest, there is no corresponding log level for trace. I'm satisfied with your explanations. Please just add a LEVEL_DEBUG constant and I'm OK with your interface. Regards, Vincent
Re: regions and writing-mode
Hi Manuel, Sorry for the delay. I think you're right. See the note in 6.4.12 fo:simple-page-master: For example, if the writing-mode of the fo:simple-page-master is lr-tb, then [region-body, region-before, region-after, region-start, and region-end] correspond to the body of a document, the header, the footer, the left sidebar, and the right sidebar. And 6.4.14 fo:region-before: This region specifies a viewport/reference pair that is located on the before side of the page-reference-area. In lr-tb writing-mode, this region corresponds to the header region. This should answer your question. HTH, Vincent Manuel Mall a écrit : This is (again) more of a clarifying question as I am looking in that area of the code and I think its incorrect: Am I correct in saying: The position of the before/after/start/end regions on the output media is relative to the writing-mode and reference orientation on the simple-page-master they belong to? Currently some of their positioning is determined by the writing-mode set on the regions themselves, which usually would be the same as on the simple-page-master, but it can be different and then the current implementation seems to get itself confused. Manuel
Re: Logging for FOrayFont
I'm satisfied with your explanations. Please just add a LEVEL_DEBUG constant and I'm OK with your interface. OK, I have added the constant LEVEL_DEBUG back, and have also added a new one called LEVEL_TRACE. PLEASE NOTE: LEVEL_DEBUG is now equal to LEVEL_FINER (it previously was equal to LEVEL_FINEST), and LEVEL_TRACE has been set equal to LEVEL_FINEST. These changes have been made to better accommodate what I understand the Commons Logging levels to be. This makes the Avalon mapping look like this: FINEST debug FINER debug FINEinfo CONFIG info INFOinfo WARNING warn SEVERE error That's fine for me! Thank you, Vincent
User config file currently discarded
Hi, By trying to debug my FOrayFont adaptation I noticed that the user config file currently isn't taken into account by the Trunk. The apps.FOUserAgent.getConfig() method is actually never called within the code, and (as a consequence I suppose) neither is the PDFRenderer.configure(Configuration) method, whose purpose is among other things to register fonts specified in the user config file. Is there a particular reason for this situation? A simple fix would be to call configure within the AbstractRenderer.setUserAgent method, where we can get the user config file associated with the UserAgent given as parameter. This is perhaps not the right place to do that? But if it's ok I can provide a patch. Vincent
FOrayFont, PS/PDFTranscoders and SVG handling
I'm about to convert the SVG library to FOrayFont. But the Batik side seems to be reluctant to see the transcoders converted to FOrayFont [1]. How should I handle that? I guess I should leave existing files as is and provide new files corresponding to the FOrayFont implementation? How should I name them? Perhaps a new subpackage? For pdf, does it concern other files than those in the svg subpackage? Which files in the render.ps subpackage are concerned? What about the pdf library? All this is still a bit unclear in my head. In two words: please help... Vincent [1] http://marc.theaimsgroup.com/?l=fop-devm=112600990201878w=2
Re: FOrayFont, PS/PDFTranscoders and SVG handling
Jeremias and Victor, thanks for the hints. I keep them under the hand for later, when it is time to migrate the stuff into XML Graphics Commons. For now I just override current implementations with FOrayFont. Anyway it will possible to recover them with svn, in case they have to coexist. Vincent
PSDocumentGraphics2D and Font dictionary
In PSDocumentGraphics2D.writeFileHeader (and also in PSRenderer.startPageSequence) the font dictionary is written into the PS file by a call to PSFontUtil.writeFontDict. At this time all of the fonts present in the fontInfo (defaults + those found in the config file) seem to be written out, even those that won't be used in the fo file. I'm a bit worried because I can't reproduce that easily with FOrayFont. All I can get is the set of fonts that were used within the document. I guess that rendering starts as soon as possible and that at the time when the file header is written out the whole document may not have been entirely parsed yet? (but the PDFRenderer only stores used fonts by making a call to FontInfo.getUsedFonts!? This also is the case in PSRenderer.stopRenderer). So the question is: is there a mean to only put used fonts when writing out a PS font dictionary? This would be cleaner anyway. I hope I'm clear. Vincent
Re: PSDocumentGraphics2D and Font dictionary
Well, so there is no simple solution :-( I could probably add a method like getConfiguredFonts in the font server to put in the postscript file all of the fonts defined in the config file. But that really sounds dirty to me. A temporary solution (before implementing a two-pass approach) would be to only support Base14 fonts; BTW, are these fonts well defined in the postscript standard? Or do they only exist in PDF? And a somewhat related question: how does font embedding work in postscript? I believe that it is like in PDF: embedding is not mandatory, one can simply put the font name in the file, and this will work if the corresponding font is installed on the client system. So this should almost always work for the fonts corresponding to the PDF Base14, and not always for others. Is there a font-naming convention? So, depending on the answers to the preceding questions: what do we choose? Systematic font embedding or only putting the font name? Thanks, Vincent Jeremias Maerki a écrit : I know exactly what you mean. The only way around this is to do a two-pass approach when writing PostScript, meaning that you keep track of resources (like fonts) while writing the pages and later you put together the complete PostScript document by including the needed resources in the right places. Obviously, that means loosing a lot of processing speed. PDF is in a better position because it's a random-access file format while PS is streaming. We can add the font objects to the PDF after we've already used them. On the other side, the PDF generated this way cannot be not a linearized file which allows Fast Web View. The browser always has to load the whole PDF file to display it because the cross-reference table is at the end of the file. So, even PDF has, in a way, the same problem. So you see: the problem is speed versus beauty. BTW, that was the reason why I started introducing a better resource handling with PS support, so we can later add such a mode where we write the PS file in a two-pass approach. On 12.09.2005 21:40:11 Vincent Hennebert wrote: In PSDocumentGraphics2D.writeFileHeader (and also in PSRenderer.startPageSequence) the font dictionary is written into the PS file by a call to PSFontUtil.writeFontDict. At this time all of the fonts present in the fontInfo (defaults + those found in the config file) seem to be written out, even those that won't be used in the fo file. I'm a bit worried because I can't reproduce that easily with FOrayFont. All I can get is the set of fonts that were used within the document. I guess that rendering starts as soon as possible and that at the time when the file header is written out the whole document may not have been entirely parsed yet? (but the PDFRenderer only stores used fonts by making a call to FontInfo.getUsedFonts!? This also is the case in PSRenderer.stopRenderer). So the question is: is there a mean to only put used fonts when writing out a PS font dictionary? This would be cleaner anyway. I hope I'm clear. Vincent Jeremias Maerki
Re: PSDocumentGraphics2D and Font dictionary
Let's look at it from another side. If someone writes some kind of FO editor or a configuration tool for FOray/FOP a method that reports all available fonts will certainly be useful. :-) OK. That makes sense. To avoid wasteful parsing, it will mean that at least 3 new classes need to be exposed through interfaces (RegisteredFont, RegisteredFontFamily, and RegisteredFontDesc), which may be a good thing anyway. Yes, I think it could be interesting. It would also be necessary to add getStream methods, now that font parsing is delegated to the font server. Currently there is only one getPDFFontFileStream method. There should perhaps be also a getPSFontFileStream, and something like getPDF/PSSubset? It seems that the client is unable to make font subsetting with the current interface. RegisteredFontFamily and RegisteredFontDesc might also be interesting for the AWT renderer, but that's another purpose. I'll perhaps come back on this later. (more below) Very good. It sounds like you and I may end up with API visions that match better than I might have thought at one time. Actually, you are no longer tied to WinAnsi. We have a lot more flexibility on encodings than before: 1. All of the predefined encodings for both PostScript and PDF are available to either platform -- of course, if they are not predefined for the platform used, they must be written into the output. 2. Both platforms have access to the font's internal encoding. 3. The user can specify custom encodings through the font-configuration file. So, if a PostScript document can use the font's internal encoding, and if the font is known to already be available to the interpreter, I think it could safely be used by name. But perhaps I have forgotten something. No, that's true. I simply haven't cared, yet, about finding out how glyphs are accessed on-the-fly in PS that are not accessible through the encoding. Rewriting the encoding seemed easier. I am very sure that for Type 1 fonts, specifying another encoding is the only way to get it done. There is just no way to get more than 256 combinations out of 8 bits and there is no way to get more than 8 bits. However, the good news is that I am 99% sure that for both PDF and PostScript you can specify the same underlying font with two (or more) different encodings. They will actually show up as two different font objects in the document and must of course be referred to that way also. I'll let you know how that turns out. This may require a new font-configuration item for the font element that allows it to tell whether it is known to be available to the PostScript interpreter. There are some other possibilities here as well. I bet. Sounds good. The more I think about it, encapsulating the characteristics of a specific PostScript interpreter is probably the right way to go. Then the rendering run can use that to decide whether the font needs to be embedded or not. I'll have to ponder that for awhile. Here I'm beginning to get lost because I don't know the Postscript standard. My hope to get ready before the upcoming realease starts vanishing... :-( Here's my summary of the current discussion: 1. Currently the Fop PSRenderer embeds all of the configured fonts in the PS file, even those that will never be used. It does this by parsing itself the font files; 2. I can't reproduce this behavior with aXSL and FOray easily, because I've no direct access to the font files; 3. Still doing this would require hacking the FOrayFont subpackage; that would result in something dirty but that should work; 4. Anyway there are several improvements to bring to the PS renderer: mainly character encoding, font embedding and in a longer term two-pass rendering for a proper font handling. Now I'm thinking of the next release: simply putting the font name in the postscript file would be rather straightforward to implement, and should work for most of cases (?), thanks to the non-standard but well-known base14 (and even base35) font set. But that's definitely a regression from the current state. Improving the PS renderer to allow proper embedding will require (1) changes to the aXSL interfaces (so a certain amount of discussions), (2) me to learn Postscript. That would prevent the FOrayFont subsystem from being integrated in the pre-release. Do you agree with my summary? Integrating FOrayFont in the pre-release would be great... Deciding to delay the integration would give me more time to investigate the insides of FOrayFont, learn PS and PDF standards and so do things much better. If there is a decision to make it does not belong to me... Vincent
Re: PSDocumentGraphics2D and Font dictionary
Victor Mote a écrit : I am not sure what you mean getPDF/PSSubset. If I'm correct it is only possible to embed the whole font file in a pdf output, by using getPDFFontFileStream. Currently aXSL doesn't seem to provide a means to embed only a subset. Point me to the FOP code that does the embedding, class name(s) and line numbers, and I'll see if I can extract it into an aXSL-exposed method. The whole code is in the class render.ps.PSFontUtil, mainly the method embedType1Font. 3. Still doing this would require hacking the FOrayFont subpackage; that would result in something dirty but that should work; Better would be to just make aXSL provide what needs to be provided. If we can hack FOray to do it, then we should be able to expose what is needed. Since nothing we are talking about here is a pollution of the interface, we should just be able to change the interface. On this point I was more thinking of a quick short-term solution for the pre-release, before taking the time to think about a clean implementation. 4. Anyway there are several improvements to bring to the PS renderer: mainly character encoding, font embedding and in a longer term two-pass rendering for a proper font handling. OK. I am confused. I thought above that font embedding worked in PS now, but this seems to indicate that it does not. Sorry, it also is a bit unclear to me. I think the precise status is the following: 1. font embedding only works with Type1 font for which a pfb file is provided (or also a pfa?). Subsetting --provided that this is specified by the postscript standard-- does not work; 2. currently only the WinAnsi charset seems to be supported. Fonts are systematically reencoded to this charset I can take some of this burden off of you, in that I can hopefully fix aXSL and FOray to provide what is needed. If that is done well, you shouldn't need to learn too much PostScript to get it to work, and perhaps one of the other developers can help you get it glued in. I don't know how much work it will take for me to get the FOray PS Renderer working (it may work now), I can use that as a test bed also. I appreciate your offer to help! Today I quickly launched the FOray PS Renderer but it doesn't seem to work. I haven't investigated, though, this may be a minor problem. Vincent
Re: PSDocumentGraphics2D and Font dictionary
Victor Mote a écrit : Jeremias Maerki wrote: output format. Maybe the Font interface should simply have a method to return a very generic interface for more detailed and font- and output-system-specific access to the font. Consumers of this interface can then cast it to a special interface/class. Something like: TargetFormatHelper Font.getTargetFormatHelper(String mime) Subclasses of TargetFormatHelper could be PDFTargetFormatHelper or a PSTargetFormatHelper. The Font This is an interesting idea, but, if I understand it correctly, breaks pluggability. aXSL and FOray easily, because I've no direct access to the font files; Which is a problem IMO. See my comments above. I *really* don't understand this. The whole point of the font subsystem is to hide as much detail as possible from the client application. If you want access to the raw font data, then perhaps the FOP 0.20.5 approach is better for what you need??!! To go a bit along with Victor, the font subsystem should perhaps provide more services, depending on the client (= the type of renderer): * a font abstraction like it is now for the layout part; * font manipulation facilities, like e.g. embedding and subsetting for the PDF renderer, conversion Type1 - SVG for the SVG one, etc. In fact I would rather put your proposed classes at the font subsystem level. If aXSL-font provides access to the raw underlying font streams, that problem basically dissolves. The following would certainly be no problem: InputStream Font.getRawStream(String part) where part may be pfb, pfm, afm, ttf etc. Is this just for embedding purposes, or do you intend to parse it? If you want to parse it, why? If all you want to do is embed it, why do you want the metrics files? FOray essentially provides the raw font stream now. It works for PDF, but, if I understand Vincent correctly, does not work for PS. So how does this method you suggest help that? See just above. Integrating FOrayFont in the pre-release would be great... Quite unrealistic as it stands now, sorry. That is your (FOP's) decision, but it makes no sense to me. You are willing to go backwards in almost any other area, but are unwilling to *not* go forwards with PostScript font embedding? Even when it is doable? Still, I appreciate knowing. I'll shift my focus back to getting my FOray release out the door. Victor, from a non-native speaker POV you seem to be a bit overreacting here. I have the feeling that I have misled you because of my bad understanding of the problem. I'm sorry if this is the case. Jeremias has a better vision of the situation than me, and I quite agree with him that the integration won't be ready for the pre-release. This does not mean that it will never be done. And after all, all the better: we will have more time to discuss about a clean API. Regards, Vincent P.S.: that said, the PDFRenderer should now work fine with the new font system; converting the SVG library should be pretty easy; this basically works for the AWT viewer. Nothing perfect, but... ;-)
Re: FOray contacts
I completely agree with Manuel. Whereas I can feel your disagreement with some decisions for the project you have always remained nice and made valuable comments. I regret your decision to leave this list because you have often been helpful where you were not expected. I'll be glad to continue to discuss with you on the FOray lists. Cheers, Vincent Manuel Mall a écrit : Victor, On Wed, 14 Sep 2005 09:07 am, Victor Mote wrote: FOP devs: I think it is prudent for me to take this temporary lull to extricate myself a bit more by unsubscribing from the fop-dev mailing list. I have tried to do this several times before, with little success, as you can see. I have no projects underway and no feuds to tend to ATM, so it is a rare (unique really) opportunity. personally I believe anyone who has worked on fop and has an active interest in XSL-FO can still make valuable contributions here. Even if you don't want to get involved in (further) design discussion based on events in the past I believe you still have lots to contribute. For example at the level of input to issues with the spec itself and its interpretation because of your extensive knowledge in that area. Therefore I am very sorry to see you leave this list. Victor Mote Regards Manuel
2 weeks offline
Hi all, I'll be offline from tomorrow for 2 weeks: visiting Japan. Although I don't have had much time to work on Fop those last days I don't abandon my work. I've taken a little break in the adaptation to learn a bit of PDF. I think this is necessary to better understand what I'm doing. The AWT renderer should now fully work. I've recently had a long discussion with Victor to properly handle fonts for both screen and print outputs; it is now possible to map the FO generic font names (serif, sans-serif, monospace) to either the default awt fonts (corresponding to the Lucida family), or to the Times/Helvetica/Courier families. This should help having the same result for both types of outputs. As the RTF renderer doesn't seem to depend on the font subsystem at all, I guess it should work fine as well. What's left : correct a little problem with CID fonts in the PDF renderer, adapt the PS renderer (which should be much easier now), adapt the SVG library for PDF and PS. Cheers, Vincent
Re: Preparing for the first release
Manuel Mall a écrit : As the project hasn't done a release for a long time and especially no release of the new codebase we should test probably a bit more extensively than usual that the distribution builds actually are working and don't contain any 'cheap' errors. To that effect I have build binary and source distributions from the current svn and made them available for download from http://people.apache.org/~manuel/fop/disttest. In the top level directory are the source and the java 1.4+ binary distributions. In the java1.3.1 directory are only binary distributions. I'm on a Debian GNU/Linux environment with both java 1.4.2 and java 1.5. I have encountered no particular problem by running the binary version on a few sample fo files. The source distribution also seems to build and run fine. My 2 cents... Vincent
Re: svn commit: r348291 - /xmlgraphics/fop/trunk/src/documentation/content/xdocs/dev/release.xml
Author: jeremias Date: Tue Nov 22 15:34:57 2005 New Revision: 348291 URL: http://svn.apache.org/viewcvs?rev=348291view=rev Log: Collect places to announce FOP releases. Modified: xmlgraphics/fop/trunk/src/documentation/content/xdocs/dev/release.xml Modified: xmlgraphics/fop/trunk/src/documentation/content/xdocs/dev/release.xml URL: http://svn.apache.org/viewcvs/xmlgraphics/fop/trunk/src/documentation/content/xdocs/dev/release.xml?rev=348291r1=348290r2=348291view=diff == --- xmlgraphics/fop/trunk/src/documentation/content/xdocs/dev/release.xml (original) +++ xmlgraphics/fop/trunk/src/documentation/content/xdocs/dev/release.xml Tue Nov 22 15:34:57 2005 @@ -75,5 +75,22 @@ liStefan Bodewig's jump href=http://cvs.apache.org/~bodewig/mirror.html;Making your Downloads Mirrorable/jump/li /ul /section +section id=announcements + titleAnnouncing the release/title + pHere's a collected list of places where to announce new FOP releases:/p + ul +lifop-dev@xmlgraphics.apache.org/li +lifop-users@xmlgraphics.apache.org/li +ligeneral@xmlgraphics.apache.org/li +ligeneral@xml.apache.org/li +liannounce@apache.org/li +li[EMAIL PROTECTED]/li +li[EMAIL PROTECTED]/li +lihttp://xslfo-zone.com/news/index.jsp/li +lihttp://www.w3.org/Style/XSL//li +lihttp://freshmeat.net/projects/fop//li +liany others?/li The docbook-apps@lists.oasis-open.org may be added. Although an announcement has already been made on this list (see the message attached). Note that Fop 0.20.5 is still much used by Docbook users. IMO Fop 0.90 will be welcome there. Vincent Message original Sujet: [docbook-apps] FOP 0.90alpha1 Date: Tue, 22 Nov 2005 14:52:57 +0100 (CET) De: Jens Stavnstrup [EMAIL PROTECTED] Répondre à: Jens Stavnstrup [EMAIL PROTECTED] Pour: docbook-apps@lists.oasis-open.org After a very long time a new FOP is being release. Be aware as the name indicates, that this is an alpha release and may in certain areas be inferior to FOP 0.20.5 [1]. Otherwise it is a huge step forward as the compliance page indicates [2]. Great work foppers. FOP 0.90alpha1 is currently being replicated to the apache mirrors. Regards, Jens [1] http://xmlgraphics.apache.org/fop/trunk/upgrading.html [2] http://xmlgraphics.apache.org/fop/compliance.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Text handling in svg files, transcoders
Hi Thomas, [EMAIL PROTECTED] a écrit : But this doesn't work when I run Fop with the same svg included in an fo file. Am I missing something? I take it this is an FO with inline SVG consisting of an SVG 'image' element referencing the SVG file? The svg file is referenced by an external-graphic element. With the 0.9alpha1 FOP and Batik 1.6 it turns out this won't work properly. If you bundle Batik from SVN this should work. That's it. It works with the newest versions of Batik and Fop. Thanks! Vincent
Re: Text handling in svg files, transcoders
Jeremias Maerki a écrit : On 25.11.2005 16:25:43 thomas.deweese wrote: snip/ Thomas, what do you think about this topic? Well I think that currently the text bridges do a pretty good job determining if they are capable of drawing text as PDF text and drop back to curves when needed. I would much rather work on catching cases where this doesn't work properly than adding another option. Do you know of cases right now where this doesn't work? Yes, the issue when someone uses custom fonts. Text drawn as shapes uses the AWT font subsystem to get at the fonts while text drawn as text needs FOP's own font subsystem to select/embed the right fonts in the target file. I assume some people would probably prefer to simply have their things as simple as possible and not to have to manage an extra font setup. On the other side, when Vincent is done with his work, we'll have font auto-discovery which will improve the situation a lot. I'm afraid of what you mean by font auto-discovery? FOray doesn't have font discovery support. This presents some difficulties as already discussed on this list [1]. I don't want to bring bad news, but... Vincent [1] http://marc.theaimsgroup.com/?l=fop-devm=111876477207479w=2
Re: Text handling in svg files, transcoders
Jeremias Maerki a écrit : Hey, allow me some wishful thinking on my part. :-) Look, if FOrayFont supports fonts without the need for a PFMReader or a TTFReader then the road to font auto-discovery is just a very small step. And the former is an absolute must. Otherwise, the whole thing is a waste of time. I'm not even afraid about the performance penalty of an auto-discovery feature. If someone points to the system's font directory with 500 fonts it's his own fault if the whole thing takes a little time to preload. If you don't do auto-discovery, that's fine. I'll do it then. I want it. :-) Ok, then no problem ;-) Just wanted to make sure you didn't believe that this functionality was already existing. Vincent
Re: [was in Fop-user] text search in PDF
Jeremias Maerki a écrit : Well, this has been a known issue for a long time and it is still not adressed in FOP 0.90alpha1. However, someone is working with the FOray project to build a better font library for FOP. Victor Mote already fixed the problem in his FOray but I can't tell what the status of his project is. Of course, that means there's probably no short-term solution. FYI: I've not investigated much concerning this problem, as I'm still missing some knowledge. However, IIC all the needed elements should be present in FOrayFont. When my patch is available it should be pretty straightforward for someone who knows to implement this functionality. Ok, should not last too long now... Vincent
Re: MathML for fop-trunk
2. When rendering MathML we used awt.Font class, which is not accessible on the Unix boxes without X-server installed. The awt.Font class is used to get font size for proper position of the upper and lower indexes and so on. Are there any chance, that we could re-use Fop generated metrics? Where we could start look at that? In FOP Trunk, the fonts package. Everything's there. However, currently Vincent Hennebert works on integrating the font subsystem from FOray which will be a little different. It may make sense not to rush into anything too quickly here. Vincent implied that it wouldn't take too long until he has something ready if I remember correctly. Right. My patch should be available in a few days now. You may start to have a look at the aXSL web page [1], and especially the axslFont module. This is the interface that will be available to users of the font sub-system. I suggest you download aXSL through the CVS repository and just build the javadoc to see what will be available to you (the most important interfaces are Font and FontUse). If you decide to look at aXSL and have any questions, please ask them on the aXSL mailing-list [2], which will be more appropriate than this list. I'm glad to see that MathML support is coming along. Thank you, Vincent [1] http://www.axsl.org [2] http://sourceforge.net/mail/?group_id=123259
FOrayFont patch almost ready
Team, I've just posted an updated pre-patch of my FOray adaptation work. I put it as a pre-patch because the junit tests don't run well anymore (about 75 errors with junit-layout-standard). However, the pdf output looks right on the few tests I have run. The weird thing is that the XML renderer doesn't seem to get the same values for the area tree elements as the PDF renderer. For now I'm unable to find what's wrong. Perhaps that those who have a better knowledge of the layout part or of the test system will be able to give some hints? I'm going on searching but I think you can start looking at my modifications right now. I hope to find the problem soon. Normally the pdf, ps and AWT outputs should work well. The default font-config file may require some adaptation, especially for the AWT output. You can find informations in [1] and by me. Please don't hesitate to tell me if I've done things wrong or to ask any questions. Thanks, Vincent [1] http://www.axsl.org/font/configure.html
Re: FOrayFont patch almost ready
Well, that's just what I was wondering: Bugzilla doesn't seem to have made a notification on the fop-dev list. O_o You may find it directly on Bugzilla, bug number 35948. There's something I don't get: its status is still NEW but it doesn't appear when searching for the new bugs for Fop. Something I forgot to mention: there is still a little issue with transcoders: there is currently no means to configure the access to the font-config file. I'll post a note on this later today, in a dedicated thread. Thanks, Vincent Jeremias Maerki a écrit : Uhm, cool but where did you post your pre-patch? :-) On 12.12.2005 22:44:04 Vincent Hennebert wrote: Team, I've just posted an updated pre-patch of my FOray adaptation work. I put it as a pre-patch because the junit tests don't run well anymore (about 75 errors with junit-layout-standard). However, the pdf output looks right on the few tests I have run. The weird thing is that the XML renderer doesn't seem to get the same values for the area tree elements as the PDF renderer. For now I'm unable to find what's wrong. Perhaps that those who have a better knowledge of the layout part or of the test system will be able to give some hints? I'm going on searching but I think you can start looking at my modifications right now. I hope to find the problem soon. Normally the pdf, ps and AWT outputs should work well. The default font-config file may require some adaptation, especially for the AWT output. You can find informations in [1] and by me. Please don't hesitate to tell me if I've done things wrong or to ask any questions. Thanks, Vincent [1] http://www.axsl.org/font/configure.html Jeremias Maerki
Re: DO NOT REPLY [Bug 37879] - PDF SVG rendering forces stroking text (config setting broken)
Don't know it this bug should be closed? Vincent [EMAIL PROTECTED] a écrit : DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=37879. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=37879 --- Additional Comments From [EMAIL PROTECTED] 2005-12-13 10:52 --- This is broken in fop-0.90 with the provided Batik jar. It should work with the svn versions of /both/ Fop and Batik. You may look at [1] and [2] for details. I think this is the only solution for now if you need this functionality. Regarding the strokeText option, it has not been implemented in the Trunk because its usefulness is doubtful (see [2]). Text will be rendered as text whenever possible, strokes will only be used as fallbacks. HTH [1] http://marc.theaimsgroup.com/?l=fop-devm=113293237123386w=2 [2] http://marc.theaimsgroup.com/?l=fop-devm=113301057529277w=2
Transcoders and font configuration
This is perhaps more just for the record. FOrayFont needs a font-config file to run properly; the most basic file will only contain the paths to the base14 metrics afm files. If no config file is specified then no font will be configured, which will obviously lead to rendering errors. I don't think this is a problem to provide a font-config file, neither for the command-line nor for the programmatic ways of using Fop. However, you may not like it; if this is the case we can discuss of possibles workarounds. Anyway there is a case when this is kind of problematic: when the transcoders are used through Batik to convert standalone svg files into ps/pdf. It seems less acceptable here to ask the user to provide a font-config file. It should be possible to bundle the base14 afm files and a default config file together in a resources jar. I've never played with such things so I don't know yet how to handle them, but that should be feasible and moreover Batik seems to already use such a mechanism. Due to that issue transcoders are currently broken; a quick-to-implement solution may be to set up a system property that would contain the path to a font-config file. Anyway, before doing anything in this area, I would like to hear the opinion of the team. WDYT? Do you know of other possible solutions? Thoughts, ideas? Thanks, Vincent
Re: Review of the FOrayFont patch and FOrayFont itself
Damn :-( Looks like some more work is needed. Problem is that it does no longer depend only on me. Basically I agree with reasons 1. and 3. I don't really get the second one, perhaps because I don't have a broad view of the problem. However the distinction between system fonts and free-standing fonts looks clear to me: the former are fonts handled by the Java awt system, for which some informations may be lacking (e.g., embedding); the latter are those for which a font file is available, and are handled externally. Anyway, I think that there is a need for reviewing all the font stuff. Some issues about font baselines, character selection, glyph substitution and so on haven't been handled yet, or only partially. I was hoping to see FOrayFont integrated as is in the trunk in a first step, before starting to improve the font system and integrate other functionalities. This looks like it is impossible. This may be useful anyway to create a branch for the patch, so that other people can have a look at it. I made the patch against revision 356368. I let you decide. I'll spend some time now studying all the needs of a font sub-system for a FO processor: on the layout side, regarding the different font types, and the various renderers. I'll collect all what has already been said on this list; I'll study the font formats in more details. I think I'll put all that on a Wiki page, but rather perhaps in the aXSL area, don't know yet. This will require time, I have many things to learn; so don't expect any concrete result before... long. Any comment or opinion is welcome. Vincent Jeremias Maerki a écrit : I've applied Vincent's patch locally and went through the code. I had to do several modifications because I've recently changed a few things in the Trunk which broke Vincent's patch. I managed to get it to work without bigger problems. However, I must say that I currently could not vote +1 to apply the patch. Let me show why: The main reason right now is lack of time. This topic will eat quite some time as it turns out. There are several points I would want to have improved first: 1. As I suggested earlier, having a mandatory font configuration is not acceptable IMO. The whole thing needs to work out-of-the-box for the set of fonts that can be considered present for each target format (Base 14 fonts for PDF and PS, AWT fonts for Java2D output etc.). To make this work, the AFM files for the Base 14 fonts would be placed in the JAR file. However, while going through FOray's source code I found that the font loading code currently needs a RandomAccessFile which means that either the file has to be first copied to the file system, a new implementation of the RandomInput interface would have to be written for access on an in-memory copy on the AFM files or the code has to be rewritten to work on a simple InputStream. 2. I'm a little disappointed that Victor did not follow my ideas of font sources, but I guess it was easier for him this way. So I can understand. With my approach, the FontServer would hold a set of font sources each of which provides access to a set of fonts (i.e. the AWT fonts, the Base14 fonts, a directory of Type1 fonts etc.). The client application would tell the FontServer (via the FontConsumer I guess) which font sources are acceptable. For each font source an URI could be defined that identifies it so that interoperability and extensibility are preserved. ATM, there are only the two somewhat artificial groups: free standing fonts and system fonts. I don't think this is flexible enough on the long run. Anyway, the current aXSL Font API feels a little strange still. But some of that I've already pointed out to Victor at an earlier opportunity. The way the used fonts are stored is also a suspect point to me since it bears the possibility of a memory-leak if a FontConsumer is not properly unregistered in case of an exception. Currently, the FontConsumer is not released at all. This would have to be looked at closely. 3. The configuration file is too complicated in my opinion (especially the mapping part). It should be much easier. The complexity somewhat kills the benefit of loading fonts directly, and not via XML font metrics files. I would really, really like to have the possibility to specify a directory and all fonts are all automatically made available. People already have problems all the time to get their fonts working. I would not like to see it getting even more complicated. After all, you don't have to write a complex configuration file when you install a font in Windows, for example. I agree that there can be sections in the config file to specify font substitutions but the default font names should be available automatically. While testing I had to specify absolute paths to get the font config working quickly. In the short time, I was looking at it I didn't manage to get it to work otherwise. In the end, I have to look at the cost/benefit ratio: + ToUnicode fix (also
Re: Combine FOP PDFBox efforts?
Hi Ben, hi All, I finally have some time to chime in, sorry for the delay. Thank you for your interest in the font subsystem. My goal is to adapt the FOrayFont library to Fop. The main advantage of FOrayFont over the Fop code is its ability to directly parse font files, whereas currently with Fop there is a two-step process: first convert the font metrics into an xml file, then use it within Fop through a configuration file. You can have the process in [1]. I've submitted a first patch in december [2], that was refused because of unacceptable shortcomings of FOrayFont. The main reasons were: * lack of a default config file; * configuration too complicated. You will find all the details in [3]. Since that I'm working with Victor on FOrayFont's improvement. We have recently ended the design phase and have agreed on a set of changes that I still have to apply (you will find the discussion on the FOray-dev mailing list archive from the last two months. I'll add more on this on FOray-dev.). After that I believe that the main shortcomings will be corrected and that an updated patch can be submitted. PDFBox is pretty independant of my work. I currently rely entirely on the Fop PDF library for PDF outputs, and I'm only adapting necessary things to make it use FOrayFont. FOrayFont is a low-level library that tries to be independent of any output format, and thus may be used by whatever renderer. So if PDFBox were to be used by Fop, for me it would just mean that I would have to adapt PDFBox instead of the Fop library. For FontBox this is different, and I think there is a possibility to share resources in this area. I'll put more details on FOray-dev, but in short it would be great if we could achieve the following: * merge the best of FontBox and FOrayFont to obtain a good font library; * agree on a common interface (i.e., an API) for the font library, that would be used conjointly by Fop, PDFBox and FOray; * adapt PDFBox to make it use this resulting library; * make it work with Fop in some manner. I would like to work with you on the two first points. As you have probably already noticed the discussion will be mainly held in the FOray area. We will chime in here for Fop-specific things and to notify Fop devs of advancements of the adaptation work. I'm glad to see that there is place for collaboration. I'm sure that we will be able to achieve Great Things ;-) Cheers, Vincent Current way to configure fonts in Fop: [1] http://xmlgraphics.apache.org/fop/trunk/fonts.html Patch for the adaptation of FOrayFont to Fop (now outdated): [2] http://issues.apache.org/bugzilla/show_bug.cgi?id=35948 Reasons of the patch refusal: [3] http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-dev/200512.mbox/browser Ben Litchfield a écrit : Jeremias, I'll start by answering your questions 1)What is minimum JDK required by PDFBox? PDFBox currently requires 1.4, because it uses ImageIO and a couple other things that make development much easier. PDFBox was compatible with 1.3 for a long time, but I made a decision that sticking with 1.3 would cost too much in development time versus using existing stuff in 1.4. In addition 1.3 is now two major versions old and in the EOL phase. As this effort will take some time before it could be released would it be reasonable to move the minimum requirement up to 1.4 for Batik and FOP at that time? 2)Does PDFBox require log4j? PDFBox used to be dependent on log4j, 0.7.2 has an optional dependency, the soon to be released 0.7.3 version will not use log4j at all. Currently PDFBox's only dependency is FontBox(see comments below), although bouncy castle will soon become an optional dependency for certificate based encryption and rhino(looks like Batik uses this as well) will also be an optional dependency for Javascript execution. Some additional comments, *After the 0.7.2 release, PDFBox split the font infrastructure into another project, so aptly named FontBox. No official version has been released yet but the project was created and all font parsing logic was separated from PDFBox. As far as I can tell there is no open source font library and for many of the same reasons we have discussed I thought it would be better as a separate project. It sounds like there has already been some discussion on making a separate font library project, I would be happy to collaborate on and donate what little font parsing code I have to that project. It only makes sense for PDFBox/FOP/Batik/... to all use a single font library. It is starting to sound like a unified font system might be the first task. *I did not realize that other projects(Batik) were using FOP's pdf library, again a separate PDFFont library makes that cleaner. As a side note, PDFs can contain SVG graphics, so I eventually saw PDFBox utilizing Batik, which makes things interesting :) *If bringing PDFBox into ASF is what is necessary to make this work than I am willing to do that.
Re: Combine FOP PDFBox efforts?
Great, I will start updating PDFBox to use the FOrayFont, I believe this will go pretty smoothly because FOrayFont is already being used for PDF creation. More details on the FOray list. We have had some recent discussions about supported JRE's, from the main page of FOray[1] it says that 1.4 is used. There is a desire among the FOP developers to maintain compatibility with 1.3. Do you know if FOrayFont compatible with 1.3? Actually I haven't taken care of this issue yet. I'm hoping that it won't be too difficult to make it 1.3-compliant, we only use basic classes of the standard library. My goal is to first have it accepted in Fop, and then do what is necessary to achieve 1.3 compliance (actually, if someone else would volounteer to take care of that last step, even better ;-) ) Vincent
Re: Google Summer of Code
Jeremias Maerki a écrit : I don't see why you couldn't also apply for a slot. Having floats would be cool (but not simple). I'm not sure about your work on FOrayFont. I think the projects have to be tied to one of the organizations listed by Google. Only the client part in FOP would be part of that. So floats would be the better option I guess. By re-reading the FAQ I noticed that the work we do for a project to which we are already contributing must be new. So it's best that I apply for floats. I can't really estimate the amount of work needed to implement floats, besides the documentation phase. My feeling is that it shouldn't be too difficult to implement before-floats, as there already is a number of papers dealing with this quite well-known issue. Side-floats should be much trickier, as they would interfere with line-breaking, and the CSS model for them looks rather complicated. We could propose before-floats as a main goal, plus basic side-float implementation and optional refinements. BTW, both of you should make sure you're eligible for participating in the SoC. See the Students FAQ on the Google SoC site. Someone who applied last year wasn't a student. As a PhD student I'm eligible, no problem. Vincent
Re: Google Summer of Code
Hi, I'm thinking about setting up a page on the FOP-wiki where I would put up the goals for my proposal. That way I could give the link when submitting my application. Nice idea, indeed. I've taken the liberty of implementing it and created a new Wiki page for the SoC, linked from the DeveloperPages [1]. The idea is to have a main page listing the various proposals, with a link to each proposal's dedicated page. You'll just have to add yours. I've put a first draft regarding float implementation. Any comment is welcome, especially regarding scheduling (I've no idea of the time each task may need - never done project estimations!), and typos. Thanks. Vincent [1] http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006
Re: Google Summer of Code
Hi Jeremias, Thanks for your review! Vincent, a comment on yours: For before-floats, you refer to best-fit and first-fit approaches. I'm not sure if it's really relevant here. If I'm not mistaken before-floats are pretty similar to footnotes which means you can probably take a lot from there. I think that first-fit only really helps when you get to side-floats. Well, my understanding of before-floats makes me think that they may benefit from a total-fit algorithm. WRT footnotes, the difference is that a footnote must appear on the same page as its reference; this is a constraint that doesn't exist for before-floats, although it is best to place them as near their reference as possible, of course. When refering to first-fit/total-fit algorithms I had the paper Pagination Reconsidered referenced on the Wiki [1] in mind. To summarize quickly it is stated that placement of floats benefits from a better algorithm than first-fit, quite like for the computation of page breaks. In fact LaTeX uses a first-fit algorithm, which often leads to floats placed pages far from their reference. So I had the feeling that a total-fit algorithm could be applied to floats as well as to page breaks. I think that user-configuration for best/first-fit doesn't help much. I rather think that FOP will have to find out itself whether it has to abandon best-fit so it can handle more complex features, i.e. switching to first-fit if it finds one of the following conditions: - side-float in the flow - page-masters with different available IPD in the flow - tables with auto table layout (when individual column widths are requested for each page) That may be discussed. I remember of discussions on this list about the cost of a total-fit algorithm regarding time and memory consumption. IIRC there was stated that such an algorithm should be optional for rendering documents that don't need it, like invoices. That's why I was thinking of a config option. Regarding Fop automatically switching to the most appropriate algorithm, depending on the situation, I'm afraid of the complexity of such a behavior. That may be difficult to estimate which strategy is to be adopted, and when it should be decided that a given strategy has failed and that it should be switched to another one. But I may be wrong. Vincent [1] http://wiki.apache.org/xmlgraphics-fop/LiteratureLinks
Re: Google Summer of Code
Hi Jeremias, Normally my application should have been sent. However I don't see it on my home page so I wonder if it has to be reviewed first or if the submit has failed. The way the webapp works isn't clear to me yet. Do you see my application? Vincent Jeremias Maerki a écrit : Hi Patrick, green lights everywhere. I've already ranked it and applied for the mentor position. It turns out that the Wiki in the ASF this time only was about collecting ideas for projects (unlike last year). The real project proposals are those on the Google site. At this moment, only 7 entries (including yours) have requests for the mentor position und therefore have a score of =4. Now we only need Vincent who should be back on Sunday. If he enters his proposal then, everything should be fine. Vincent, ping me when you entered it so I can go right to the ranking part. On 05.05.2006 18:27:50 Patrick Paul wrote: Hi Jeremias, Any news from GSoC ? Were you able to see my application and rank it ? How are things going within the ASF ? Patrick Jeremias Maerki wrote: It looks like you two need to sign up for the GSoC directly on the website and enter your application there based on the info on the ASF Wiki. This needs to be completed until May 8. After that the applications will be rated by the mentors. To me this was all a bit confusing despite the extensive FAQs. Realized only today that I have to sign up, too, as a mentor. http://code.google.com/ On 28.04.2006 17:24:43 Patrick Paul wrote: Same here, I've added an entry for auto table layout, right after Vincents. Patrick Vincent Hennebert wrote: Ok, I've added an entry for floats implementation. I'll be off-line from tomorrow until the 6th of may. Just hoping no particular problem will occur during my absence. Anyway, if I don't answer mails that's just normal. Vincent Jeremias Maerki a écrit : Yes, please add entries for the two projects. That Wiki page is the first station. I'm not sure, yet, who exactly will transfer the proposals to Google (can't remember from last year, either), but the first step is to identify the projects inside the ASF. I'll need to read through the whole mentoring info and stuff during the weekend. You can list me as a mentor on both projects. If I can get help from another FOP committer mentoring, all the better. On 27.04.2006 16:22:44 Patrick Paul wrote: Jeremias, Do you think we should have our two projects posted on the Apache Wiki ? http://wiki.apache.org/general/SummerOfCode2006 Is it better to go through that page, or will our proposals be forwarded directly to you anyway ? Thanks, Patrick Jeremias Maerki Jeremias Maerki
Re: Google Summer of Code
Ok, this should have worked this time. Don't know what happened. Vincent Jeremias Maerki a écrit : Vincent, I'm afraid I don't see your application, either. You could simply try again or contact the GSoC support. On 07.05.2006 19:12:26 Vincent Hennebert wrote: Hi Jeremias, Normally my application should have been sent. However I don't see it on my home page so I wonder if it has to be reviewed first or if the submit has failed. The way the webapp works isn't clear to me yet. Do you see my application? Jeremias Maerki
Re: svn commit: r406917 - in /xmlgraphics/fop/trunk: src/java/org/apache/fop/fonts/truetype/TTFFile.java status.xml
snip/ Also do we know what Foray Fonts does? Prefers the OS/2.sTypeAscender over hhea.Ascender just like FOP did before my patch. FWIW: I've just noticed Victor of the issue. Just waiting for his comments. BTW, my work on fonts (which already wasn't going very fast) may further suffer from my participation in the SoC. If anyone wants to take the lead... Vincent
Re: Google Summer of Code
Hi Simon, Simon Pepping a écrit : I have winters of code and summers of less code. That diminishes my ability to provide assistance. What kind of work and responsibilities would assistance include? What would be the time constraints, that is, how fast needs a review be done or a question be answered? And in which period is the work done? The work begins right now and ends on August 21. There is a mid-program evaluation on June 26. As far as I'm concerned, I'm planning to put my thoughts about the project on a dedicated Wiki page. I think that just providing high-level comments on them would already be of great help: am I going into the right direction? Am I running into a dead-end? Have I missed something important? In particular, I'll be playing with Knuth's glue/box/penalty model, and as you seem to have good knowledge of it I would be glad to hear from you about that. Anyway, whatever you'll be able to do will be fine. Thank you for your interest! Vincent
Re: Google Summer of Code
Hi Simon, Simon Pepping a écrit : I have one small comment on your decomposition of the line breaking algorithm: * defining a somewhat arbitrary formula used to compute the demerit of each break, and which is to be minimized; I find the above second item in this list a bit misplaced. It is part of the definition of the algorithm rather than its actions. You're right. What I wanted to do is extract the three most important aspects which IMO characterize the algorithm. I've rephrased the text to make it clearer. Regarding the last list, I am not sure what you mean by 'a floating sequence of g/b/p items'. A subsequence whose position in the large sequence is floating? Exactly. This subsequence represents before-floats and footnotes. Footnotes are a bit more constrained as they should appear (at least partly) on the same page as the footnote-citation. Thank you, Vincent
Re: [GSoC] Wiki page for progress informations
Hi Jeremias, Jeremias Maerki a écrit : did you already investigate how footnotes are implemented? Can you say anything about how similar the problem of footnotes is to before-floats? Just so you don't have to start from scratch while there may be something to build upon. After all, the footnotes also contain some logic to move certain parts to a different page than where anchor is located. I'm certainly planning to look at how footnotes are implemented. There will probably be things to share in this area. My feeling right now is that floats may be easier to deal with as they are not required to appear on the same page as their citation. Another thing that we may need to keep in mind: There was lots of desire from the user community that FOP supports large documents (long-term goal, not necessary yours). I wrote that a first-fit algorithm could help free memory earlier. Obviously, for complex before-float situations a total-fit approach is probably more interesting as it can come up with more creative solutions. I'm just mentioning it so we keep the bigger picture in mind and since there could be conflicting goals. Actually it is stated in the project's goals that two algorithms be implemented: a quick, memory-friendly one (first/best fit) and a high-quality, slow, memory-consuming one (total fit). It seems that best-fit will be similar to first-fit in terms of process- and memory-consumption, yet better in quality. But this has still to be investigated. Also, some work might be shared with Patrick as the page-breaking algorithm will affect automatic layout of tables, as far as I understand. Vincent
Re: [GSoC] Wiki page for progress informations
Thanks a lot Luca! This will help me find my way in the code. I keep your comments in mind for when I better understand the whole issue. Vincent Luca Furini a écrit : Jeremias Maerki wrote: did you already investigate how footnotes are implemented? Can you say anything about how similar the problem of footnotes is to before-floats? Just so you don't have to start from scratch while there may be something to build upon. After all, the footnotes also contain some logic to move certain parts to a different page than where anchor is located. A few quick comments about the footnote implementation: 1) the FootnoteLM returns only the sequence of elements representing the inline part (not the footnote-body part); it just adds to the last (inline) box a reference to the FootnoteBodyLM. 2) the LineLM, after computing the breaks, adds to each (block) box representing a line the references to the FootnoteBodyLM whose citations are in that line 3) during the remaining of the element collection phase, these references are not used (but in the creation of combined element lists, when they should be copied inside the new elements) 4) the PageSequenceLM.PageBreaker.getNextKnuthElements() method, after receiving all the (block) elements, scans them looking for footnote information, gets the elements from the referenced FootnoteBodyLM and puts them in a different list (at the moment a list of lists, but this is sub-optimal), and from the footnote-separator (in a separate list) 5) these lists are looked at in PageBreakingAlgorithm.computeDifference(), where we try to add some footnote content to the normal page content using getFootnoteSplit(), and in computeDemerits(), where some extra demerits are added if we break a footnote or some footnotes are deferred. This last point at the moment is performed using many PageBreakingAlgorithm private variables, which is maybe not the best way to do it, as we must be very careful about their initialization and their use, especially when the algorithm restarts. I think that a state object storing these variables could be used to store these values, and explicitly passed along the methods instead of relying on the class members, but concerning this I'd like to hear the opinions of the other committers ... Insertion of before-floats could be implemented in a similar way, giving the precedence to the footnote insertion (as it is affected by more strict constraints). An important difference between a footnote and a before-float is that the latter does not have an inline part, so (if we want to follow the same pattern) we need to either store the reference inside a previously-created box or to add some new elements containing the reference (but we must be sure that these elements cannot be parted from the previous ones, see the constraints in section 6.10.2 in the spec). A crucial point is the demerit function as, if I remember correctly, it greatly affect the computational complexity of the breaking algorithm (thre should be a M. Plass paper concerning this). HTH Another thing that we may need to keep in mind: There was lots of desire from the user community that FOP supports large documents (long-term goal, not necessary yours). I wrote that a first-fit algorithm could help free memory earlier. Obviously, for complex before-float situations a total-fit approach is probably more interesting as it can come up with more creative solutions. I'm just mentioning it so we keep the bigger picture in mind and since there could be conflicting goals. A first degree of first-fit algorithm could be achieved quite quickly by having a BreakingAlgorithm interface which is implemented by a TotalFitBA (the existing implementation) and a FirstFitBA which would have a much simpler considerLegalBreak() method that, instead of the complex set of nodes, just keeps in mind a single node. This would surely decrease the memory footprint, but is not (I think) what we really want, as this simplified algorithm would be performed on the whole sequence of elements. In order to start processing the sequence as soon as we receive a few elements we need to do some deeper changes. An idea (I just had it now, so I did not fully consider all its implications). At the moment, the block-level LM collect elements from their children and return just a single sequence (if there are no break conditions); we could have a parameter requesting them to return after they receive each child sub-sequence, and have a canStartComputingBreak() method that returns true if the sequence contains enough elements and we are using a first-fit algorithm, or false otherwise ... Sorry for the long post ... and for the long absence too, but it seems that just after thinking great, now I've really got some time to spend on FOP I receive tons of other things to do ... :-( Regards Luca
Plass' thesis: Optimal Pagination Techniques...
Hi all, A bit more than one year ago Plass' thesis was cited on this list [1]. By looking at the 24 Page Preview available on the site it seems that this works may help me to have a better understanding of the issue, and solves some of the problems I raised on the Wiki. I would like to have the confirmation by those (namely Jeremias) who have bought this book if it is worth its price (35$ for an image-only pdf file... a bit expensive, IMO). So? Thanks, Vincent [1] http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-dev/200503.mbox/[EMAIL PROTECTED]
Re: [Xmlgraphics-fop Wiki] Update of GoogleSummerOfCode2006/FloatsImplementationProgress by VincentHennebert
I've looked at all the pages in the section Documents with Relation to the Knuth Approach of DeveloperPages, which provide a good starting point. I don't think there are other pages elsewhere? Vincent Simon Pepping a écrit : Vincent, Are you aware that Luca has documented his implementation on the Wiki? Simon On Fri, Jun 09, 2006 at 09:56:44AM +0200, Jeremias Maerki wrote: Vincent, don't hesitate to send patches if you want to clean up and better document the code. I've recently wondered if it made sense to split PageBreakingAlgorithm into two classes, for example, a VerticalBreakingAlgorithm (base class and used by static-content and block-container) and a PageBreakingAlgorithm (which adds footnote and before-float functionality). That may make the code clearer. On 07.06.2006 19:34:56 Apache Wiki wrote: snip/ + == The Fop Source Code == + Even if it is well explained, the Knuth line-breaking algorithm isn't that easy to understand. ATM I've concentrated on the class layoutmgr.BreakingAlgorithm, which contains the part of the algorithm which is common to page- and line-breaking. It is splitted in parts which follow pretty closely those described in Digital Typography. It relies on the following skeleton:{{{ snip/ Jeremias Maerki
Re: DO NOT REPLY [Bug 39777] - [PATCH] GSoC: floats implementation
Hi, --- Additional Comments From [EMAIL PROTECTED] 2006-06-12 12:45 --- (In reply to comment #0) This patch isn't really meant to be applied... Rather to be reviewed by interested parties to check if I'm not wrong. Changelog: * javadocs for the Knuth line- and page-breaking algorithms. Some items are marked with double question marks because I haven't found out yet what is their purpose. I will probably find eventually, but if anybody has immediate hints they will be welcome. KnuthBlockBox: bpdim seems to be used in concert with the proprietary display-align=fill value Luca implemented. See AbstractBreaker.optimizeLineLength(). If I understand it right it is somehow used to make sure all the pages have more or less the same amount of content (in bpd). OK. Actually this is the natural width (without stretching nor shrinking) of the line represented by this box. This field apparently exists because it isn't possible to get the min/opt/max values stored in a MinOptMax object. Otherwise it could be retrieved from the opt of the ipdRange field. It may perhaps be useful to add such methods to MinOptMax? pos and bAux are defined in ListElement/KnuthElement. Hmmm. Does a Position object represent the index of the Knuth element (here KnuthBlockBox) in the sequence managed by the corresponding LM? What does it mean that a box is auxiliary? BreakingAlgorithm: alignment: EN_BEFORE is not used. EN_START is used instead, since the class is used in both ipd and bpd. EN_BEFORE is mapped into EN_START. Actually, alignment uses a slightly different set of value than the FO properties, so reusing the integer constants may not be the best thing, but we're not under Java 5, yet, where we could use enums. bFirst is used for the text-indent property so only the first paragraph of a block is indented. See block_text-indent.xml. You probably mean the first /line/ of a block? partOverflowRecovery is used in page breaking to defer an element which would overflow the available BPD to the next page if it's the only element in a part (=line or page). I'm still a bit unsure of what 'part' means in the javadoc of BreakingAlgorithm.isPartOverflowRecoveryActivated. A line/page? A word/block? A Knuth box? * some methods have been marked deprecated because AFAICT they are not called anywhere. If this is agreed I'll remove them in my next patch +1 * bugfix? In the last for loop of the method layoutmgr.PageBreakingAlgorithm.noBreakBetween I think the exit condition should be a strict comparison ('' instead of '='). Confirmation? not sure. :-( The code in the for loop checks if the element pointed to by index is a legal breakpoint. If the exit comparison isn't strict index may reach the value breakIndex, which by definition is a legal break. I guess the purpose of the noBreakBetween method is to check that there is no legal break between the two given breakpoints, /excluded/. The line storedValue = (index == breakIndex) would confirm that. * the javadoc comments for some methods have been removed because they will inherit them from their super-class I think Checkstyle will bark about that. If you do Ctrl-J in Eclipse, you get an automatic @see entry which satisfies Checkstyle. @inheritDoc does not work in every Java version. In fact checkstyle doesn't complain. It seems to be smart enough to detect that there is a javadoc for the original version of a redefined method. In such cases javadoc copies the definition from the super-class, and that's also what Eclipse does in the tooltip. I may put @see statements, but I think it doesn't really make sense. * some checkstyle fixes HTH Updated patch follows. Thanks, Vincent
Re: keep...=always and Knuth penalties
FYI: I'm planning to refactor the breaking algorithm in order to implement floats. I'll see what can be done in this area. Just keep in touch. Vincent Manuel Mall a écrit : On Monday 19 June 2006 16:45, Jeremias Maerki wrote: On 18.06.2006 20:57:51 Simon Pepping wrote: On Sun, Jun 18, 2006 at 07:36:45PM +0800, Manuel Mall wrote: snip/ Or should we use a more refined approach were we generate initially an INFINITE penalty but if the page breaking cannot find a solution we reduce the penalty on some/all of those elements given an INFINITE penalty because of keeps and run the page breaker again? I am in favor of this solution. There are generally two solutions: increase the tolerance, or force a solution. I think FOP already has a force parameter for this purpose. +1. Yes, BreakingAlgorithm has a force parameter which is currently set to true for page breaking. There's also a threshold. We can probably play with that first. See LineLayoutManager.findOptimalBreakPoints(). Yes, there is a force parameter and it seems to be always set to true for page breaking (and false for line breaking). But it doesn't seem to guarantee that breaks will be found otherwise we shouldn't get the giving up after 50 retries message. Anyone who understands how this force parameter is suppose to work? Jeremias Maerki Manuel
[GSoC] How the work should progress
Hi all, I'd like to have the opinion from the team about how I should proceed. I'm currently at a point where I think I know enough, both from theoretical and code points of vue, to start the implementation of floats. By mimicing the handling of footnotes, I think I can have a working implementation rather quickly and easily. However, it wouldn't be very satisfying IMO. Some refactoring wouldn't be useless, and while I'm at it, why not doing it completely? I've already spent much time figuring out how the code is working. From what I've seen, some areas of the code still look experimental. I think the implementation of floats may be an opportunity to bring it to a more polished level. A refactoring would have several benefits: - this may help sorting things out, and even prepare the implementation of a first-fit algorithm (although this might be a bit too much unrelated, I'm afraid) - this may help future contributors to easier understand this area of the code and get involved more quickly - this is always better to have a clean design. Moreover, I think this is possible to make the implementation even more object-oriented, which would help sharing code between the line and page levels. - a refactoring process is more efficient and secure if one has the opportunity to think full-time about it... That's why I would propose to refactor the breaking algorithm. However, to do things properly I would need to understand a bit more of the code than just the breaking stuff. This may take some time, especially if I want to make sure that I don't introduce new errors. The implementation of side-floats may suffer from that. That was not the original intent of the SoC project, but I think this would be a benefit for Fop. WDYT? Vincent
Re: [GSoC] How the work should progress
Chris, Simon, Thanks for your feedback. It seems that I've missed Jeremias before he goes in Ireland. Too bad that my mail took half an hour to arrive. Anyway, I have work to do on my thesis during a few days. Hopefully Jeremias will be able to look at his mails during the conference. If not, I'll follow Simon's advice and refactor just what is necessary to implement floats. Cheers, Vincent Simon Pepping a écrit : On Wed, Jun 21, 2006 at 04:32:29PM +0200, Vincent Hennebert wrote: Hi all, I'd like to have the opinion from the team about how I should proceed. I'm currently at a point where I think I know enough, both from theoretical and code points of vue, to start the implementation of floats. By mimicing the handling of footnotes, I think I can have a working implementation rather quickly and easily. However, it wouldn't be very satisfying IMO. Some refactoring wouldn't be useless, and while I'm at it, why not doing it completely? Be careful. Don't let yourself be sidetracked by other worthy objectives. Stay focused on your target; it will be difficult enough. I've already spent much time figuring out how the code is working. From what I've seen, some areas of the code still look experimental. I think the implementation of floats may be an opportunity to bring it to a more polished level. A refactoring would have several benefits: - this may help sorting things out, and even prepare the implementation of a first-fit algorithm (although this might be a bit too much unrelated, I'm afraid) I would hope that a best-fit algorithm can be inserted. Regards, Simon
Re: [GSoC] How the work should progress
Hi Jeremias, snip/ Please do try to refactor the footnote and before-float stuff out into a separate class to make the whole design clearer. But don't shift your focus too much. Some factoring: +1, total refactoring -0.5, keep focus on your task: +1. ;-) Ok. So definitely I'll just refactor what is necessary to cleanly implement before-floats. That said, some further refactoring might be needed for side-floats... But we'll see at that moment. Thanks, Vincent
PercentBaseContext uselessly inherited
Hi all, I've noticed that many *LayoutManager classes explicitly implement the datatype.PercentBaseContext interface while it is already extended by the LayoutManager interface. Same for the BlockLevel- and InlineLevelLayoutManager interfaces. All those classes or interfaces, which implement or extend LayoutManager, implicitly also implement/extend the PercentBaseContext interface. Thus there is no need that they themselves implement/extend PercentBaseContext. If this is agreed I'll remove the unnecessary extends/implements statements in my next patch. Or have I missed something? Vincent
Re: PercentBaseContext uselessly inherited
Hi all, I've noticed that many *LayoutManager classes explicitly implement the datatype.PercentBaseContext interface while it is already extended by the LayoutManager interface. Same for the BlockLevel- and InlineLevelLayoutManager interfaces. All those classes or interfaces, which implement or extend LayoutManager, implicitly also implement/extend the PercentBaseContext interface. Thus there is no need that they themselves implement/extend PercentBaseContext. If this is agreed I'll remove the unnecessary extends/implements statements in my next patch. Or have I missed something? Ok, I /have/ missed something. I should perhaps have taken a nap before writing that. I let myself getting confused by the way the javadoc displays informations. For a class it gives all of the implemented interfaces, even those which are only indirectly inherited. Sorry for the noise :-/ Vincent
Re: Error message: Should be first
One of my clients reported to me that he gets a Should be first error message on the log. This happens in (Page)BreakingAlgorithm.removeNode(). I get the impression that the code there is not finished rather than that is a real error condition. I'll try to extend removeNode() so it really removes the disabled node. See the attached demo file (You'll need italian hyphenation available to get the error). I'll try to fix that tomorrow. If Luca or anyone else has any further comments on that, I'd appreciate it. I don't have any error with this file. However, I've had to change the font names because i_helvetica is unknown. I've tried with Helvetica and Helvetica + font-style=italic (as I suppose is what the i_ means) but I still don't get any error. How did you get it? Regarding the should be first error, that's a part of the algorithm I don't completely understand, yet. That said, the removeNode method is called, among other places, in filterActiveNodes. This is only a guess, but if there is a place where the removeNode's precondition isn't respected, that might be here. HTH, Vincent
Necessary conditions to defer footnotes
Hi All, there is something I don't get with the handling of footnotes. When there is not enough room on the current page to place all the footnotes, the algorithm tries to find a place where to split them. But there is a condition: it must be possible to defer old footnotes (PageBreakingAlgorithm, l.332). And this is possible only if there is no legal breakpoint between the previous active node and the currently considered breakpoint (checkCanDeferOldFootnotes method). I don't understand this latter condition? And, reading the code, I don't understand if this method's purpose is to determine if it is /allowable/ to defer footnotes (am I authorized to defer footnotes if any), or if it is /possible/ (are there footnotes to defer). Ok, this is a bit subtile, but understanding that would help me get the intent of the algorithm. What is the relation with before-floats? Well, I'm currently refactoring this part of the code to factorize out as much as possible things common to floats and footnotes. And this part of the code, currently applied to footnotes, may well be also applied to floats. Hints? Vincent
[GSoC] BreakingAlgorithm: simplify handling of activeLines
Hi All, Good news: before-floats are working. There probably are bugs and place for improvement but I think it is time to submit a first patch, so that you may see what I've done. I'm currently cleaning up and documenting my code, and I think the handling of the activeLines array may be simplified: currently, for a line l, activeLines[2*l] points to the first active node for this line, and activeLines[2*l+1] points to the last node. But the last node is never directly accessed, only by starting at the first one and following the links. There must be a reason for this code but I don't see it. Perhaps this is related to some older code which since was removed? Or have I missed something? However, if it is ok I'll simplify that in my patch. Vincent
Re: [GSoC] BreakingAlgorithm: simplify handling of activeLines
Hi All, Good news: before-floats are working. There probably are bugs and place for improvement but I think it is time to submit a first patch, so that you may see what I've done. I'm currently cleaning up and documenting my code, and I think the handling of the activeLines array may be simplified: currently, for a line l, activeLines[2*l] points to the first active node for this line, and activeLines[2*l+1] points to the last node. But the last node is never directly accessed, only by starting at the first one and following the links. Perhaps I misunderstand your question, but I think the last active node in a line is used when adding yet another active node for that line at the end of the linked list. In BreakingAlgorithm:addNode(): activeLines[headIdx + 1].next = node; Ah yes, I get it now, thanks Finn. In fact this is to have the insertion of a new node in constant time. Grrr, should have found it out by myself. On the other hand, a different data structure of nodes might very well open up different improvement. The current structure of using a linked list for each line, is just the best I could come up with at the time. I think the structure is fine; I was about to propose to just switch to Java Collections, as Simon said he did in his implementation, but I think I'll leave it as is for now. As Simon's work will probably be eventually integrated, this will be an opportunity of refactoring (BTW, your work may help me implement side-floats; I'll have a closer look at it once I'm done with before-floats). Vincent
Two days offline
I personally would gladly work, but my brain no longer wants to. Guess I need a break. Vincent
Re: DO NOT REPLY [Bug 39777] - [PATCH] GSoC: floats implementation
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=39777. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=39777 --- Additional Comments From [EMAIL PROTECTED] 2006-07-23 19:47 --- I finally had a chance to take a first look. What I've seen so far looks pretty nice. A first simple test behaved as would be expected. Making a before-float too big to fit on a page (although there are break points inside the content) results in an OutOfMemoryError (probably due to an infinite loop). It would be good if you would write a set of test cases for before-floats as your next task. This is to document what works and what doesn't and to give you and us more confidence when doing further chances in the code. Testcases should be ready tomorrow. Finally, would you compile a list of classes you propose to move into the breaking package? The idea itself is worth investigating since the layoutmgr package has already grown rather big again. On a general matter, I would put in the breaking subpackage all of the classes from layoutmgr and its subpackages which are related to the Knuth approach: the algorithm as well as the various Knuth elements. A quick look gave me the following classes: AbstractBreaker BalancingColumnBreakingAlgorithm BreakingAlgorithm BlockKnuthSequence InlineKnuthSequence KnuthBlockBox KnuthBox KnuthElement KnuthGlue KnuthPenalty KnuthPossPosIter KnuthSequence PageBreakingAlgorithm inline.KnuthInlineBox This will probably create many access problems (with currently package-private members). But this will be an opportunity to clean up the whole thing a bit, I think. In a second step there is also a number of inner classes which might be extracted and transformed into a top-level class of the new breaking subpackage. I'm mainly thinking of inline.LineLayoutManager.LineBreakingAlgorithm. I guess there are reasons why they currently are inner classes, but it may be conceptually better anyway to separate them from their surrounding class. This would have to be studied more deeply. As for when to apply the change, we may probably wait for the integration of Simon's work. I created the package just because I had a new class to put in it and thought that I might as well directly put it in the right package. So, WDYT? What we need to decide now is whether to put the changes in a branch until they stabilize or if we put it in Trunk. I'd prefer a branch for now so in case I have time to finish the work on 0.93, we don't have any problems from this end. A branch would be fine I think, as this would allow me to submit more gradual patches. Vincent
Re: DO NOT REPLY [Bug 39777] - [PATCH] GSoC: floats implementation
I managed to reenable one of the disabled test cases because you were fooled by the default values for widows and orphans. Having only 3 lines in a block does not allow any break possibilities with default widows and orphans. 4 lines creates one break possibility in the middle. Indeed, yes. Well, thanks! Vincent
space-start and space-end for block-areas
Hi all, I think there is a problem in the spec regarding the space-start and space-end traits for block-areas. The like-named properties only apply to inline-level formatting objects, so I guess that for block-areas those traits are indirectly-derived from other properties (start-indent and margin-*). The problem is that this is explained nowhere in the spec how to compute those traits, although I guess they should be given the values of the corresponding margin properties. Let me remind the three points of the spec which IMO are involved here: 1. in section 4.4.1, Stacked Block-areas, there is a rule, among others, stating that for a block-area: the start-edge of its allocation-rectangle is parallel to the start-edge of the content-rectangle of R (where R is the closest ancestor reference-area of B), and offset from it inward by a distance equal to the block-area's start-indent plus its start-intrusion-adjustment (as defined below), minus its border-start, padding-start, and space-start value 2. in section 5.3.2, Margin, Space and Indent Properties, there are rules for computing the values of start- and end-indent when margin-* is specified, and vice-versa. 3. in section 5.1.2, Computed Values, there is the following: Specifying a value for one property determines both a computed value for the specified property and a computed value for the corresponding property. So in our situation, if margin-* is not specified then it must be computed from the (possibly inherited) specified value of start-indent. Here I disagree (but I may be wrong) with the IndentInheritance Wiki page, where it is written (e.g., in the first example) that the rules from 5.3.2 are not always triggered. In my opinion they are, otherwise we can't compute the value of the space-start trait (more below). Now, let's consider the following problem: - writing-mode is lr-tb, reference-orientation is 0 (most common case in western countries); - we have an fo:block; - we want to compute the offset of the start-edge of the generated block-areas's content-rectangle from the start-edge of the closest ancestor reference-area's content-rectangle. Let's assume that: - xa is the x-coordinate of the start edge of the block-area's allocation rectangle; - xc is the x-coordinate of the start edge of its content rectangle; - the origin of the coordinate system is the start-edge of the content-rectangle of the closest ancestor reference-area; - there is no side-float so we can forget the start-intrusion-adjustment in the formulae. Then we have the following: (1) xa = xc - start-indent (definition of allocation-rectangle in 4.2.3) (2) xa = start-indent - border-start - padding-start - space-start (section 4.4.1) That gives us the offset of the block-area's content rectangle: (3) xc = 2*start-indent - border-start - padding-start - space-start If margin-left is set on the fo:block (and assuming that the inherited value of start-indent is 0), start-indent is computed like that: start-indent = margin-left + padding-left + border-left-width which gives us for xc: xc = 2*margin-left + padding-left + border-left-width - space-start This corresponds to the intuitive understanding provided that space-start is set to margin-left. If margin-left isn't specified but start-indent is, then the definition of the start-indent property (§7.10.7) lets us expect that xc = start-indent So the formula (3) becomes: start-indent = 2*start-indent - border-start - padding-start - space-start which works only if space-start = start-indent - border-start - padding-start Again, this is ok if we give to space-start the value of margin-left computed according to section 5.3.2. Thus I think me may assume that the space-start and space-end traits for block-areas are given the computed values of the corresponding margin properties. Now there is a problem if the inherited value of start-indent is not 0. Then the value of space-start becomes (3rd formula of section 5.3.2): space-start = start-indent - inherited-start-indent - padding-left - border-left-width so xc = start-indent + inherited-start-indent which breaks the expected interpretation of the start-indent property in §7.10.7 Applied to the following example, taken from the IndentInheritance Wiki page: fo:flow fo:block start-indent=10ptindented text fo:blocknested block/fo:block /fo:block /fo:flow For the nested block, start-indent is set to the inherited value of 10pt. margin-left is computed according to the formula of section 5.3.2: margin-left = start-indent - inherited(start-indent) - 0 - 0 = 10pt - 10pt - 0 - 0 = 0 So space-start = 0, and xc = 2*start-indent - 0 - 0 -0 = 20pt, which is not the expected value. I think the inherited value of start-indent should be removed from the third formula of section 5.3.2. Note that this
Re: space-start and space-end for block-areas
Really want to dig that one out again? :-) He he ;-) I guess yes. I was starting to look at all the intrusion-adjustment and -displace stuff when I stumbled upon this issue. I need to have an absolutely clear understanding of that if I want to implement side-floats correctly. Before I go into details inline below, let me stress that the margin-* properties are defined in XSL-FO for compatibility with CSS. They don't rant And that complicates things because we have to adapt the wordings of one spec to another one. And this is all the more difficult as there are always uncertainties in specs. They should be written in some kind of formal language IMHO. /rant play a direct role in the FO geometry. 5.3.2 simply tells us how to map the margin-* properties to start/end-indent. Furthermore, I think you are mixing property evaluation (refinement stage) with the area model (layout stage). 4.4.1 is about the area model and includes stuff like instrusion-adjustment which is absolutely no topic at property handling level (chapter 4 vs. chapter 5). So you can't just mix equations from chapter 4 and 5. Well, I'm aware of the difference between properties and traits, but I was trying to find from which properties the space-start trait may be derived for block-areas. Because the space-start property only applies to inline-level formatting objects. snip/ Now, let's consider the following problem: - writing-mode is lr-tb, reference-orientation is 0 (most common case in western countries); - we have an fo:block; - we want to compute the offset of the start-edge of the generated block-areas's content-rectangle from the start-edge of the closest ancestor reference-area's content-rectangle. Let's assume that: - xa is the x-coordinate of the start edge of the block-area's allocation rectangle; - xc is the x-coordinate of the start edge of its content rectangle; - the origin of the coordinate system is the start-edge of the content-rectangle of the closest ancestor reference-area; - there is no side-float so we can forget the start-intrusion-adjustment in the formulae. Then we have the following: (1) xa = xc - start-indent (definition of allocation-rectangle in 4.2.3) with no intrusion adjustment, also: xc = 0 + start-indent (7.10.7) (inserting xc in (1)) -- xa = 0 7.10.7 makes a statement for the area model. Ok. (2) xa = start-indent - border-start - padding-start - space-start (section 4.4.1) With no intrusion adjustment: (J1) space-start = start-indent - border-start - padding-start (variables are traits here) Ah yes, so this formula comes from the statement in 7.10.7 (BTW, in case of mixed writing modes / reference-orientations this statement is wrong; I thought they introduced the allocation-rectangle just for dealing with that). But then, the formula of 4.4.1 may be (greatly) simplified: xa = start-intrusion-adjustment If it is so complicated, is it because the formula (J1) may not always be true? And that's not the same as 5.3.2: margin-left = start-indent - inherited(start-indent) - padding-left - border-left-width (variables are properties here) More on that below. That gives us the offset of the block-area's content rectangle: (3) xc = 2*start-indent - border-start - padding-start - space-start If margin-left is set on the fo:block (and assuming that the inherited value of start-indent is 0), start-indent is computed like that: start-indent = margin-left + padding-left + border-left-width which gives us for xc: xc = 2*margin-left + padding-left + border-left-width - space-start This corresponds to the intuitive understanding provided that space-start is set to margin-left. If margin-left isn't specified but start-indent is, then the definition of the start-indent property (§7.10.7) lets us expect that xc = start-indent So the formula (3) becomes: start-indent = 2*start-indent - border-start - padding-start - space-start which works only if space-start = start-indent - border-start - padding-start Again, this is ok if we give to space-start the value of margin-left computed according to section 5.3.2. Thus I think me may assume that the space-start and space-end traits for block-areas are given the computed values of the corresponding margin properties. Now there is a problem if the inherited value of start-indent is not 0. Then the value of space-start becomes (3rd formula of section 5.3.2): space-start = start-indent - inherited-start-indent - padding-left - border-left-width No, 5.3.2 says: margin-left = start-indent - inherited(start-indent) - padding-left - border-left-width margin-left != space-start !!! space-start depends on the intrusion-adjustment. margin-left does not. I don't think so, as there is a start-intrusion-adjustment trait for that. But I'm ok with the rest. snip/ So, I've the feeling that on this issue the spec is both incomplete (how to compute
Re: space-start and space-end for block-areas
snip/ Ah yes, so this formula comes from the statement in 7.10.7 (BTW, in case of mixed writing modes / reference-orientations this statement is wrong; I don't think so. FOs like block-container create a viewport/reference pair. The viewport-area does the rotation, the reference-area is already in the same rotation as its immediate children. Well, again I may be wrong, but: in section 4.2.3 we have the following: the content-rectangle of an area uses the inline-progression-direction and block-progression-direction of that area; but the border-rectangle, padding-rectangle, and allocation-rectangle use the directions of its parent area. Thus the edges designated for the content-rectangle may not correspond to the same-named edges on the padding-, border-, and allocation-rectangles. So if we want to put, say, some Japanese in an English text, the main flow would be in lr-tb writing-mode, and we would put an fo:block-container with a tb-rl writing-mode. The block-container would generate a viewport-reference pair of areas with the following mapping: border-rectangle, padding-rectangle, allocation-rectangle before-edgetop after-edge bottom start-edge left end-edge right content-rectangle before-edgeright after-edge left start-edge top end-edge bottom Note that in section 4.2.2, it is stated that the start-edge and end-edge of the content-rectangle of [the reference-area] are parallel to the start-edge and end-edge of the content-rectangle of [the viewport-area]. If we set a start-indent for the fo:block-container, this would be the space between the left-edge (start-edge) of the flow's content-rectangle and the left-edge (after-edge) of the block-container's content-rectangle. WDYT? I thought they introduced the allocation-rectangle just for dealing with that). But then, the formula of 4.4.1 may be (greatly) simplified: xa = start-intrusion-adjustment If it is so complicated, is it because the formula (J1) may not always be true? I don't think it's that complicated. The spec just tries to explain the relationships of the various properties and traits. I don't think all of these statements are meant to be used as literal formulas. I think we rarely, if at all, have to deal with the space-start trait, for example. Right now, we shift in by start-indent and out again by padding-start and border-start. That's all that's necessary to paint everything. Of Of course, but independantly of how this is actually implemented, I'd like to make sure I understand the formulae rightly. snip/ Well, I think I get it now. It's frustrating to spend so much time in just trying to understand a spec... Imagine how I felt in the first half of 2005 while getting up to speed with the mean details of the spec. It's just normal you feel that way. If you look in the archives, you'll see that you're not alone. A feeble consolation, I know. Yes, but still one. This seems to confirm that the problem relies in the spec, not in my brain ;-) Thank you, Jeremias. Vincent PS: There seems to be a problem, then, with the third paragraph of the attached fo file. IIUC it should be placed 1 cm right from the black border. And if I remove the start-indent=0 attribute from the fo:block it should be placed 2 cm right. WDYT? Yes, that's what's happening and what should be happening. I don't see the problem, sorry. Well, with my working copy I get the following results: When start-indent is explicitly set to 0cm for the third paragraph, the text is placed 1 cm /left/ from the black border: http://atvaark.dyndns.org/~vincent/ref-area_start-indent-0cm.pdf When start-indent is unset, the text is placed 2 cm left from the black border: http://atvaark.dyndns.org/~vincent/ref-area_start-indent-none.pdf Same result with Fop 0.92beta. My working copy is up-to-date with the repository and contains no local modification. I wonder where is the problem. May I ask which results do others get? Thanks, Vincent
Re: Implementing OpenType font support, how hard?
Hi Bertrand, As I've made some work in this area, I can provide a few additional hints. In fact I'm kind of a bridge between FOray and Fop and am working on adapting FOrayFont to Fop. Currently I'm not doing much because I'm busy with some other work on Fop for the Google Summer of Code, but I should have some time again to work on this from mid-september on (mmmh, is it too late for you?). snip/ 2d) Re-enable kerning, as OpenType fonts are usually of high quality and deserve to be used with automatic kerning. Ok, that should be obsolete. One point about 2) is that Vincent Hennebert and Victor Mote are working on FOrayFont to create a better font library which we'd like to use when it's finished. So this may mean that some of this work would better be done in/for ForayFont. Finishing 2) would then also mean finishing FOrayFont to the degree that it can be used in FOP. I guess that will need further deliberation. (Some quick background on this: I submitted a patch in december 2005 which integrated FOrayFont into Fop; it was not applied because of too severe limitations of FOrayFont. I'm currently working on implementing the missing features; there are still two of them to implement, and then the adaptation may be restarted. Of course the Fop code has quite evolved since december.) I believe there is basic OpenType support in FOrayFont, I can't say more without having a deeper look. AFAICT there are two areas of work: * complete the parsing of OpenType font files; * make sure the API provides access to the most advanced features of OpenType fonts. The FOray project: http://foray.sourceforge.net/ You may be interested in subscribing to the foray-dev list. 3) Additional steps for OpenType GSUB table support The goal is to enable the smart font features of OpenType, automatic ligatures as mentioned above, language-dependent glyph substitutions (different shapes if a letter is at the beginning of a word for example), automatic decorative swashes at the beginning or end of words, etc. 3a) Decode the GSUB table of the OpenType font (and other tables that might be required to use it) and store its data in the FOP XML font metrics file One goal of FOrayFont is to make the separate metrics file obsolete. The font files will be interpreted directly. This should also simplify the whole system, especially for the user. 3b) Modify the chars-to-metrics mapping to handle things like automatic ligatures, where several chars map to a single glyph Here I think you can profit from my work on kerning to handle special cases. The only problem I see with ligatures is when a word may be hyphenated between two characters for which there is a ligature: if it ends up being hyphenated the separate glyphs should be used, otherwise the ligature glyph should be used. I don't think this can be easily represented in the current Knuth glue/box/penalty model which is used to break lines into paragraphs. I also believe that in German (and perhaps other languages), ligatures are not welcome in some compound words. And I wonder if the ligature mechanism is similar for non-western languages which heavily use them (Arabic, for example). 3c) Implement GSUB table handling, glyph substitutions (or reuse an existing library for this, but the only one that I've found is freetype, haven't found one in Java). 3d) Create test documents to demonstrate this, asking a font provider for a donation of some OpenType fonts to use in FOP tests. That's one possibility. Another one might be the DejaVu fonts which we have found after a LOT of searching for a font with an ASF-compatible license. However, I haven't received any official feedback on license compatibility, yet. OTOH, I'm not sure if those fonts will enable you to show off all the features you want to implement. Aren't DejaVu fonts only TrueType fonts? Even this wouldn't be complete, as OpenType allows specific features to be enabled for specific character runs, like use alternate glyph set 2 for this character only. But it would be a good start already ;-) :-) Sure. At this point I'm mostly interested in your opinion on points 1) and 2) above, if these enhancements seem realistic I might be able to work on them in my current project. Point 3) obviously needs more work and might not fit my budget at this point. In general, I'm happy if we get some reinforcements on the font front. 2) shouldn't be a very big task. But I assume the whole FOrayFont thing might make this a little more complicated. AFAIK, OpenType allows different variants of a font in one font file (ex. normal and bold). We've had requests to support those font files. Have you found out during your investigations what would be involved in supporting this and would this be in scope for your work? So far, I've been unable to find out how this is handled. Thanks for any feedback on this! -Bertrand [1] http://xmlgraphics.apache.org/fop/0.20.5/fonts.html#embedding [2] http
start-indent for line-areas
Hi All, Hem, once again :-\ In section 4.5 of the spec it is written that, for a line-area, the start-edge of its allocation-rectangle is offset from the start-edge of the content-rectangle of the nearest ancestor reference-area by the sum of its start-indent and start-intrusion-adjustment. The start- and end-edges of the allocation-rectangle are the same, whichever value the line-stacking-strategy trait takes. A line-area is a block-area, so the start-edge of its allocation-rectangle extends outside the content-rectangle by start-indent. Thus the x-coordinate of the content-rectangle is 2*start-indent + start-intrusion-adjustment?! Obviously this is wrong. I guess it should better... no, I don't guess anything. What have I missed? Thanks, Vincent
Re: start-indent for line-areas
A line-area is a special sort of block-area (4.5, 1st sentence), it does not have any border and padding. Furthermore, 4.4 defines the behaviour of block-areas and makes special comments that many of those feature don't apply to block-areas which are line-areas (for example for start/end-indent). Hmmm, reading and re-reading the spec I find nothing about that. Section 4.4 says that a block-area which is not a line-area must be properly stacked. So that holds for a block-area with line-area children. Which let me think that the stacking rules of 4.4.1 apply to line-areas. I mean, in the given description B may be a line-area. So, I'm not sure where you got your 2*start-indent from, but I think A line-area being a block-area, x_content-rectangle = x_allocation-rectangle + start-indent. Section 4.5 says that x_allocation-rectangle = start-indent + start-intrusion-adjustment. So x_content-rectangle = 2*start-indent + start-intrusion-adjustment. It may be that the start-indent of a line-area is not equal to the start-indent of its parent block-area. But then I don't know how it is supposed to be computed. It may be that for line-areas, the allocation-rectangle should rather be the border-rectangle (and, then, also the content-rectangle since line-areas have no border nor padding). The definition of the allocation-rectangle for a line-area in section 4.5 would then be consistent, the line-area's rectangle would coincide (when there is no intrusion) with the parent's content-rectangle in the i-p-d. This would correspond to what you said just below: you may not involve start/end-indent with line-areas. AFAIU, line-areas all extend to the edges of the parent content-rectangle in inline-progress-direction (i.e. start and end) if there's no instrusion. Or perhaps this definition is wrong and the start-edge of the allocation-rectangle should coincide with the start-edge of the ancestor ref-area's content-rectangle (when there is no intrusion). Like for other block-areas, in fact. I think I'll go with the second possibility. Of course, I guess the allocation-rectangle does not appear in the code but this is to be sure placements will be rightly computed. Does that help? Yes thanks, Vincent
Re: [Xmlgraphics-fop Wiki] Update of GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats by VincentHennebert
2006/8/11, Apache Wiki: Dear Wiki user, You have subscribed to a wiki page or wiki category on Xmlgraphics-fop Wiki for change notification. The following page has been changed by VincentHennebert: http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats The comment on the change is: Difficulties around side-floats Ok guys, I've been searching for 3 days for a simple, elegant, powerful, effective, ligthweight solution to make the total-fit algorithm work with side-floats, but can't seem to find one. This issue is somewhat related to the one regarding differing available ipd in page sequences (solving one might help solve the other one, at least). There is also some similarity with tables, i.e., how to combine several vertical Knuth element lists into just one. Excepted that in the case of side-floats, we may choose several completely different solutions to place them. And this is a case-by-case decision: one time this will be better to differ the float, one other time to break it, one other time to compress it... In some situations, a best-fit approach could even produce better results, as we would have the possibility to consider the differing of a side-float. But it is well-known that best-fit may be much worse than total-fit regarding before-floats and footnote placements. Looking at how tables are handled might give me some ideas for side-floats, but this would require some time and there isn't much left now. I also have some ideas for improving the handling of before-floats and footnotes, and I'd like to implement them while I have time and it's still fresh in my memory. The implementation I propose on the Wiki page has its limitations but should work in most cases. This might give a basis for further improvements. I'm thinking about making a poll on fop-user to know what are their expectations regarding side-floats, and which usage they would make of them. This might help make some design decisions. Well, in one word, I'm a bit lost as for what to do, now. WDYT? Vincent
Re: [Xmlgraphics-fop Wiki] Update of GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats by VincentHennebert
2006/8/11, Jeremias Maerki: snip/ In some situations, a best-fit approach could even produce better results, as we would have the possibility to consider the differing of a side-float. But it is well-known that best-fit may be much worse than total-fit regarding before-floats and footnote placements. Are you sure that it's much worse? Note that even TeX uses best-fit for page breaking. Oh yes I'm sure ;-) It's a common complaint from users of the LaTeX world that figures are placed in a weird manner. There are plenty of parameters to tweak the output by hand, but this doesn't always give satisfying results. And given that manual intervention isn't really an option in the FO world... Note also that TeX implemented best-fit for page-breaking only because of limited computation resources at that time (you know, those good old 80's). Looking at how tables are handled might give me some ideas for side-floats, but this would require some time and there isn't much left now. I also have some ideas for improving the handling of before-floats and footnotes, and I'd like to implement them while I have time and it's still fresh in my memory. The implementation I propose on the Wiki page has its limitations but should work in most cases. This might give a basis for further improvements. I'm thinking about making a poll on fop-user to know what are their expectations regarding side-floats, and which usage they would make of them. This might help make some design decisions. That should certainly provide some interesting feedback. Well, in one word, I'm a bit lost as for what to do, now. Hmm, yeah, it's a bit difficult. Best-fit vs. total-fit plays into this. Best-fit will most likely replace total-fit because of the additional features we can cover. If that means some drawbacks on things like footnote placement that could be acceptable if the drawbacks are not too great. ATM, I cannot estimate the impact of the change. Too bad we don't Well, to me switching to best-fit is out of the question. Total-fit is a killer feature for technical documents with lots of figures and footnotes (the current implementation is already better than Xep...). This would give Fop a big advantage over concurrent implementations. Rather, we might consider implementing both strategies. Ideally there would be an option in the config file ( optimize-for-books vs optimize-for-fancy-layouts or something like that). The generation of Knuth elements would be the same, only the breaking algorithm would differ. But abandoning total-fit is not an option, IMO. Moreover... have much theoretical reference material on using the Knuth approach on page breaking. Most of what exists today around page breaking is by M.F. Plass or worked out by us on our Wiki. ... I'm still hoping to find a solution compatible with the total-fit approach. The glue/box/penalty model is simply not powerful enough to represent tables, side-floats and the like. All we have to do is find another model... I would not consider it a bad move if you concentrated the rest of your time of the GSoC on the before-floats and footnotes, especially if it's unlikely that you can finish side-floats in time and if the switch from total-fit to best-fit hangs in the air. Even without an actual implementation the groundwork for a side-float implementation in form of some very good documentation is a very satisfying result. I would consider the goals of the GSoC project met. It might be good to add some best-fit-specific comments, though. Ok, will do. How does that sound? Well, now I know what to do ;-) I'll first work on footnotes and before-floats improvements. I'll also try to implement the simplified solution for side-floats. Depending on the results I'll write the suitable documentation. Thanks! Vincent
Re: [Xmlgraphics-fop Wiki] Update of GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats by VincentHennebert
Hi Simon, Vincent, This page represents a good piece of work. First some nit picking regarding language: 'A, if X; B, else': change else - otherwise (2x) 'We may choose to either differ a side-float, or ...': change differ - defer Thanks. Changed. Then a comment on the rules: Rule 1. Why does the rule not require not both x = 0 and x + ipd = ipd(ref-area), for both start and end floats, unless the float is wider than the ipd(ref-area)? In other words, why is rule 7 not required for any start and end floats? Hey, you're right. Ok, rule 1 is correct: a start-float may not stick out of the start-edge of the ref-area, period. The constraint on the opposite side is given by rule 7, which actually is badly phrased. More precisely, the formal wording is not equivalent to the loose wording given between parentheses. The loose wording is quite clear and intuitive; the formal wording forgets the case of a float alone: it's not because it is alone that it may stick out of the ref-area. Well, now that you've pointed it out this is pretty obvious... I've reformulated this rule according this time to the loose wording. Tell me if you don't agree. In 'Properties of the model': I do not see that rule 7 is satisfied. A start-float begins at the end-edge of the ref-area and is pulled along this edge (which is like a wall). So by nature it may not stick out of this edge. What the illustration attempts to show is that if previous start-floats occupy too much place, then the new start-float will strike against their after-edges (the start-guide) without being able to go beforer. Finally some thoughts on a possible algorithm: The algorithm should combine pagebreak and linebreak calculations in a single dynamic calculation: Yes, I think we should try to find something like that. for each legal pagebreak for each active pagebreak node layout the page and calculate its demerits Laying out the page involves breaking each paragraph on the page into lines; each legal linebreak/active linebreak node combination (that is, each iteration in the two nested loops of the linebreak calculation) is associated with a certain side float layout, and thus the line widths for that case are known and the demerits can be calculated. One issue is that some legal pagebreaks are unknown until paragraphs are laid out (because of widows/orphans, for example). So the for each legal pagebreak is not that simple and might involve some backtracking. It is just a rough idea. I have no clear picture what the linebreak calculation in combination with side floats looks like. I just have the feeling that it should be possible in principle. Obviously, it may Yes. I think the biggest problem is to decide when to defer a float or not. Otherwise, as new floats have no impact on already computed lines the current line-breaking algorithm won't have to be too much reworked. Of course, I'm not speaking of values other than line for the intrusion-displace property. I don't even want to think about the other values... be necessary to break a paragraph into lines several times, each time with a different side float layout. It will be necessary to store earlier linebreak calculations for a paragraph in a clever way, so as to avoid unnecessary recalculations for identical linewidths. Working this out into a realistic algorithm requires much more thinking. Regards, Simon Thanks for your comments, Simon. Vincent
Re: [Xmlgraphics-fop Wiki] Update of GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats by VincentHennebert
2006/8/15, Simon Pepping: Hi, One more thing. All your beautiful pictures are on your own web site. Would you mind if we copy the pictures to the home page of one of the committers on people.apache.org, and change the links on the Wiki page? That is a more permanent solution. I fully agree. We may perhaps just wait for the end of the GSoC, as I might create new ones or rework some until that. Vincent
Re: Some comments on improving the algorithm for before-floats
Hi Simon, Ok, I've taken out my LaTeX book again to be sure I understand you. Vincent, Your proposal to improve the algorithm for the placement of footnotes and before-floats sounds fine. A few comments. 'Ideally there would be a configuration setting telling which ratio of the page should be filled with normal content; if this ratio is null then pages only made of out-of-line objects would be allowed.' I think this may be split into several configuration settings: - The minimum amount of normal content on a page. OK. This corresponds to the \textfraction parameter, right? - Whether float pages are allowed. Even when the minimum amount is not zero, the user may set this to true. OK. ...mmmh, found no dedicated LaTeX parameter for that. \floatpagefraction=0? - The minimum amount of float content on a float page before it may be considered feasible. Only relying on the normal demerits calculation for the stretch or shrink may be too restrictive. Moreover, if the figures are made of images, there is likely to be few shrink/stretch. This is also the \floatpagefraction parameter? Actually I don't really understand this parameter. At least, I don't understand its interest: this means that underfull float-only pages are acceptable? This looks weird to me. But as it would be easy to implement, I can do it. Related question: would footnotes be allowed on float-only pages, or only before-floats? This may be useful for books with many many footnotes. But for other books this can look weird. WDYT? Another config parameter? In fact, these are configuration parameters in LaTeX. Regarding the demerits for deferred out-of-line objects, a simple multiplication with the page difference produces a linear relation. This may be too weak, and a squared or steeper relation may be preferable. No. Period. Ok, some explanations ;-) This would break the property of optimal substructure which makes the dynamic programming approach work. In his thesis, Plass proved that using a squared function leads to an NP-complete problem. In Pagination Reconsidered, Brüggeman-Klein et al. showed that using a linear function is nearer to a human's feelings, is solvable by dynamic programming, and gives satisfying results. So I think we may go with it. Regards, Simon Thank you, Vincent
Re: [Xmlgraphics-fop Wiki] Update of GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats by VincentHennebert
Hi Simon, 2006/8/17, Simon Pepping: Rule 1. Why does the rule not require not both x = 0 and x + ipd = ipd(ref-area), for both start and end floats, unless the float is wider than the ipd(ref-area)? In other words, why is rule 7 not required for any start and end floats? Hey, you're right. Ok, rule 1 is correct: a start-float may not stick out of the start-edge of the ref-area, period. The constraint on the opposite side is given by rule 7, which actually is badly phrased. More precisely, the formal wording is not equivalent to the loose wording given between parentheses. The loose wording is quite clear and intuitive; the formal wording forgets the case of a float alone: it's not because it is alone that it may stick out of the ref-area. Well, now that you've pointed it out this is pretty obvious... I've reformulated this rule according this time to the loose wording. Tell me if you don't agree. Rule 7a is logically correct, but I would say that the rule simply states that a start float should not stick out at the end side, even if it is not the one that is flush with the start side. Then x + ipd = ipd(ref-area) follows even without the condition ipd = ipd(ref-area). Hmmm. If the ipd of a float is greater than ipd(ref-area), then it /is/ allowed to stick out at one side (end-side for start-floats, start-side for end-floats). On the contrary, if ipd = ipd(ref-area), then the float is not allowed to stick out at any side. That's why there is the condition. Don't you agree? Rule 7b: the conclusion does not follow. The argument should be that an end float should not stick out at the start side, so that x =0. Same here: x must be = 0 unless ipd ipd(ref-area). Re-thinking of that, I think the normative wording of rule 7 actually is correct, even if it doesn't say exactly the same thing as the non-normative one; when coupled with rule 9, it becomes equivalent. I think I'm going crazy. In 'Properties of the model': I do not see that rule 7 is satisfied. A start-float begins at the end-edge of the ref-area and is pulled along this edge (which is like a wall). So by nature it may not stick out of this edge. What the illustration attempts to show is that if previous start-floats occupy too much place, then the new start-float will strike against their after-edges (the start-guide) without being able to go beforer. I see now that the rules are satisfied. To show that it is only necessary to point out that it is satisfied by the initial position, and is not violated by subsequent movements. Whether it is stopped by the start guide is not relevant in this argument. Well the illustration was making sense when rule 7 was written the previous way. Now it could well be removed... unless I rewrite rule 7 as previously. You say nothing about the end floats. The argument is of course the same. Will add a word. One issue is that some legal pagebreaks are unknown until paragraphs are laid out (because of widows/orphans, for example). So the for each legal pagebreak is not that simple and might involve some backtracking. Yes, there is a problem there. The solution could be as follows: When the legal pagebreak is in a paragraph, it is also the considered legal linebreak. It is tested whether this linebreak could end the last line of the page. And deactivate the node if it turns out that this linebreak corresponds to (e.g.) the next-to-last line of the paragraph? Hmmm, that could work. I'll think about that. Thanks, Vincent
Re: [GSoC] Quick news
Ok, I'll need a couple of additional days to finish this work. Between a research paper and the actual code there is quite a gap... I had some hard time trying to find a proper design, dealing with special cases while factorizing the common code. In particular, deciding how to handle too-short/too-long nodes and the recovery mechanism wasn't easy. I'll be offline tonight and tomorrow, but I hope to have the patch ready for next sunday or monday. Cheers, Vincent Hi all, The GSoC is over :-( I wanted to submit a patch containing a full implementation of before-floats before the end of the GSoC, but it turns out that this was more difficult than expected. Currently there are plenty of bugs, no line of javadoc, no comment, much cleanup to do, etc. But right now I'm unable to do anything but going to bed. Perhaps more in a few hours... Cheers, Vincent
Re: [GT2006] Registration is open! Cocoon GetTogether 2006 (Oct 2-4,Amsterdam)
He he, you can count me in guys. I'm looking forward to meet you in real life. I should be there for both days. Are you planning to participate in the evening events? Or do we organize our own? Vincent 2006/8/22, Jeremias Maerki [EMAIL PROTECTED]: I for both days. Arriving late Monday morning and leaving again Wednesday late morning. On 22.08.2006 20:42:02 Simon Pepping wrote: I registered for Monday. Regards, Simon On Mon, Aug 14, 2006 at 11:20:59AM +0200, Arje Cahn wrote: Hi FOP folks, If you'd like to attent the Cocoon hackaton for FOP-hacking, please use the normal registration form on www.cocoongt.org, deselect the Cocoon GetTogether checkbox and select both Hackaton boxes. Then, scroll down a little and from the Knowledge level list, select Working on a different project, here for the hackaton :). The hackaton-only fee is 25 Euro's per day (shouldn't be too bad I hope...!). Hope to see you guys in October! Regards, Arjé -- Simon Pepping home page: http://www.leverkruid.eu Jeremias Maerki
Re: FOP Poster
2006/8/28, Jeremias Maerki: Gang, I've finally finished (more or less anyway) the poster I plan to put up at OpenExpo on 2006-09-20. I'd appreciate if someone could take a quick peek and tell me if it's looking too ugly or if there are any spelling mistakes. The logos may seem a bit dark on screen, but they look fine in print. http://jeremias-maerki.ch/download/fop/fop-poster.pdf BTW, the poster is done entirely with FOP and Batik. :-) Congratulations, very nice work. A few comments and suggestions: - in section Output Formats: not sure, but it might be preferable to put a space between the dots and the or in ...or your format - in section Foreign XML support: missing parenthesis on the first item - you might want to put the very same sentence ...or your own format in both sections (Output Formats and Foreign XML Support). This might be more eye-catching - it is best judged on the final print output, but perhaps the section titles in bold? - finally, but that's nitpicking: you might want to replace the upright apostrophes and quotation marks by their true typographic versions (U+2019 for the apostrophe, U+201C, U+201D for the quotation marks). For a professional look... Vincent
Re: [GSoC] Quick news
Thanks for your support, guys. I've made some progress since last week, but there are still some bugs well hidden here and there, and unexpected behaviors (but also some improvements, phew). For now I'll take a little break and spend some time with my family. I should get back to work in one week. I'll remain reachable, though. Cheers, Vincent
Re: svn commit: r442282 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/trunk/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/extensions/ src/java/org/apache/fop/fo/extensions/
2006/9/11, Jeremias Maerki: As mentioned last month, I just changed the property names a little bit. If anyone finds these inappropriate, please yell. This is not a big deal, but the names bother me a bit. Reading widow-content-limit makes me believe that that corresponds to the maximum authorized amount for widow content, which sounds a bit strange. What about min-widow-content? But I wouldn't mind if you keep the current names. Vincent
Re: [Xmlgraphics-fop Wiki] Update of PageLayout/PageAndLineBreaking by SimonPepping
Hi Simon, I finally took the time to read and digest your Wiki page. This is an interesting reading. A few comments: According to that representation paragraphs with inline text have legal linebreak points. I consider those legal linebreak points also as legal pagebreak points. In addition, there are legal pagebreak points between the vertical elements such as paragraphs and blocks. One issue that will have to be addressed is that widow or keep-with-previous settings may invalidate some previously believed legal breakpoints. In such cases, active nodes which contain those breakpoints in their chains will have to be deactivated; if they were the chosen best nodes, some other nodes will somehow have to be retrieved to replace them. I hope this won't be a too great difficulty. Within the inner loop, we consider the page and paragraph layout between the active and the current pagebreak point. If the active breakpoint is within a paragraph, we calculate the best line breaks from that breakpoint to the end of the paragraph. For all complete Unless the current breakpoint lies in the same paragraph. In page independent linebreaking, for each feasible breakpoint the best node is retained, which represents the best layout of the paragraph up to that point, and which, due to the dynamic principle, is part of the best layout of the whole paragraph if that layout uses this breakpoint. If line numbers matter, the best node for each line number is retained. In page dependent linebreaking, even that is not enough. We must retain the best node for each vertical offset on the page, because that is the quantity that influences page breaking. This Good point. This led me to the following thoughts: Currently the iteration over the active nodes is broken into two loops: one loop for iterating over the line numbers, one for iterating over the active nodes associated to each line number. Why? Because if line widths aren't the same they have an influence on the computation of best line breaks. Because when considering a given legal breakpoint, we must know the width of the line it would end in order to be able to compute the shrinking/stretching for that line. In fact we make a distinction between line numbers because they determine the /context/ in which linebreaks are computed. If all the lines had the same widths such a distinction wouldn't be necessary. The merging of line- and page-breakings generalizes this problem of differing contexts. This time, not only the line number counts, but also the page number, the offset from the beginning of the page, the out-of-lines to be placed, etc. I think the greatest challenge will be to identify all the elements which determine that context, and to be able to compare two contexts and say if they are equivalent or not. Considering the case 1 you describe on the wiki page, there are only two different contexts: page number even or odd. In this case the offset from the beginning of the page doesn't count. In other more complex documents this may be much more complicated. When the linewidths depend on the page number, we need to remember the best pagebreak node for each feasible pagebreak point for each page number. Otherwise, we only need to remember the best pagebreak node for each feasible pagebreak point. Note that the latter condition is true in the presence of out-of-line elements, because those are related to the content of the page, not the the page number. Small typo: not the the page number Optimization opportunity 1: We may need to reuse many times the best layout from a breakpoint to the end of the paragraph, and the best layout from the start of the paragraph to a breakpoint. Therefore we need to store a reference to the best end node for either case in the active node. If we wish to take into account the different possible heights of the part of the paragraph, we need to store references to a set of best end nodes. Especially for long page sequences, a page breakpoint may be feasible both for an odd and an even page. In that case, we need to store different end points for each, due to the different line widths on odd and even pages. This optimization is certainly true for the start of the paragraph. There will be many page layouts on which the whole paragraph fits. So, this might be handled automatically by the dynamic algorithm if we were able to identify the different contexts. Optimization opportunity 2: Do we need to consider each active node? Or can we already determine that some active nodes will never give a better layout than others? Suppose that a paragraph has two feasible breakpoints A and B, which have an equal number of lines, or even the same height, before and after the page breakpoint. Suppose that B has a higher amount of demerits than A. Can we then conclude that B will never be part of the best layout, because a better layout can always be achieved with A? Yes, we can. Same here, we would just have to detect that linebreak
Re: svn commit: r446682 - in /xmlgraphics/fop/trunk/src/documentation/content/xdocs: 0.92/upgrading.xml trunk/upgrading.xml
Author: jeremias Date: Fri Sep 15 11:53:15 2006 New Revision: 446682 URL: http://svn.apache.org/viewvc?view=revrev=446682 Log: mention that the config file format has changed. snip/ + If you are using a configuration file, you have rebuild it in the new format. The format A small typo: you have /to/ rebuild it... snip/ Vincent
Re: FOP embed truetype font into postscript file [was in fop-users]
(Switching to the fop-dev list, as this discussion is becoming more and more code-related. I suggest you to subscribe to this list if you haven't already.) Nguyen, Thang a écrit : Now I'm drown in PostScript specification :), could you tell how or where can I find documents on the way FOP uses font metric file embeded font with pdf ouput, and what's the current way of FOP deals with font metric file post-script output. I hope that someone can help me, it would be a lots more easier than looking at the source code. There is no particular documentation about how fonts are handled by Fop, apart from the page explaining how to configure custom fonts [1]. So you'll have to look at the source code. Unluckily for you, this area of the code is undergoing some heavy refactoring, as the current font library of Fop will soon be replaced with another one. That said, it doesn't prevent you from starting to study how the PS renderer should be extended to support TrueType fonts. For any font-related issue the PS renderer will rely on the aXSL API [2]. So what would need to be done is to check that this API provides all the necessary informations for embedding a TrueType font in the rendered PS file. If so, then this is only a matter of writing the necessary aXSL method calls within the PS renderer and putting the needed postscript glue around the font informations. If the API doesn't provide the necessary informations, then it will have to be extended but we'll see that in a second step. So I suggest you to start by having a look at the aXSL API, the code of the PS renderer and the specifications of TrueType and Postscript. If you have further questions then, just ask. Vincent [1] http://xmlgraphics.apache.org/fop/trunk/fonts.html [2] http://www.axsl.org/font-r/ thanks, Thang. -Original Message- From: Vincent Hennebert [mailto:[EMAIL PROTECTED] Sent: Thursday, October 05, 2006 5:06 PM To: fop-users@xmlgraphics.apache.org Subject: Re: FOP embed truetype font into postscript file Nguyen, Thang a écrit : Seem that a lots of works ahead :-) and do you know anything about GhostScript http://www.ghostscript.com/awki , I just looked at it yesterday and I hope that I can find something useful. Ghostscript is a tool that can convert postscript or pdf files into image formats (PNG, JPEG), render them on screen, print them on non-postscript printers, re-work them (extract pages, n-up printing...), etc. While this is a very useful tool it won't interest you for that task, excepted for visualizing generated PS files. It is perhaps capable of extracting useful informations from postscript files, but I'm not sure. HTH, Vincent -Original Message- From: Jeremias Maerki [mailto:[EMAIL PROTECTED] Sent: Thursday, October 05, 2006 4:33 AM To: fop-users@xmlgraphics.apache.org Subject: Re: FOP embed truetype font into postscript file I don't have much time right now, so I can only give you the most important stuff. I can give you more on the weekend, if you need more. The most important part is having the PostScript and TrueType specifications around: PS spec: http://www.adobe.com/products/postscript/pdfs/PLRM.pdf Other PS-related docs: http://partners.adobe.com/public/developer/ps/index_specs.html OpenType 1.4 spec (also applies to TrueType): http://partners.adobe.com/public/developer/opentype/index_spec.html http://www.microsoft.com/typography/otspec/default.htm Other font-related docs: http://partners.adobe.com/public/developer/opentype/index.html I'd search for Type 42 font in the PS language reference to start with. However, if you want to have all glyphs of a TrueType font available you have to look into the CID keyed fonts direction. I don't know anything about how CID handling is done in PostScript, only for PDF, so for this part you're pretty much on your own for now. It's probably easiest, if you try to find/produce a PostScript file from a different application that can already generate the right code for handling TrueType fonts so you get an idea how this is done. You might also find some helpful information on the web. Another important hint is that some of the PostScript code is not in FOP itself but in XML Graphics Commons: http://svn.apache.org/repos/asf/xmlgraphics/commons/trunk/src/java/org /a pache/xmlgraphics/ps/ Most font-specific code, however, is still in FOP because we haven't factored out the font library, yet. So the rest of the code will be here: http://svn.apache.org/repos/asf/xmlgraphics/fop/trunk/src/java/org/apa ch e/fop/render/ps/ Good luck! On 04.10.2006 02:48:52 Nguyen, Thang wrote: It's great. Could you guide me where to start ? Thang. -Original Message- From: Jeremias Maerki [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 03, 2006 2:49 PM To: fop-users@xmlgraphics.apache.org Subject: Re: FOP embed truetype font into postscript file
Re: XSL-FO 2.0 workshop in Heidelberg next week
Jeremias Maerki a écrit : If anyone has any requirements for XSL-FO 2.0 which I should bring up at the workshop in Heidelberg next week, please let me know. Deadline 2006-10-16 so I have time to prepare. Jörg's comments just reminded me of something I think is missing in the current spec: Enable the compact box scheme specified in CSS2: if an inline box is short enough to fit in the margin of the following block box, it is put in the margin; otherwise, it is transformed into a block box to be put before the following block box. That allows to mimic the DT/DD items of HTML: termthe definition of the term term. The definition of the term term. The definition of the term term. The definition of the term term. The definition of the term term. Another term too long to fit in the margin the definition of the too long term. The definition of the too long term. The definition of the too long term. The definition of the too long term. Unless I'm wrong, I don't think this is currently possible to do that in XSL-FO. Thanks, Vincent
Re: svn commit: r462814 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/pdf/PDFToUnicodeCMap.java
Grmblbmlbbllbll. Forgot to change my Eclipse settings. Sorry, won't happen again :-\ Vincent Author: jeremias Date: Wed Oct 11 07:30:34 2006 New Revision: 462814 URL: http://svn.apache.org/viewvc?view=revrev=462814 Log: Tabs again. :-)
Re: [VOTE:RESULT] Vincent Hennebert as new committer
Many thanks to all of you for your support. I'm glad to enter the XML Graphics team. I think this is a team of really nice persons, it has always been great to work with you as a contributor and it will be a pleasure to work now as a committer. At the GetTogether I felt what belonging to the Apache community really means. To me this is one of the greatest aspects of open-source development, and I'm proud to now be a member of such a community. Thanks again, I'll do my best to deserve my new status! Vincent Jeremias Maerki a écrit : Ok, high time to wrap this up. We have: 10 +1 1 +0 no other votes (6 of 8 PMC members have voted) Vincent is now an ASF committer with write access for FOP and XML Graphics Commons. Congratulations, Vincent, and welcome! I'll follow up with the administrative stuff in a second. On 11.10.2006 10:48:29 Jeremias Maerki wrote: Simon and I were able too meet Vincent Hennebert in person in Amsterdam last week. He has already made an impression with his excellent work for the GSoC. He's a very nice and intelligent guy, eager to learn and to work on FOP. Another guy who's not afraid to jump into the depths of the layout engine. Since the GSoC he has found a job at Anyware (http://www.anyware-tech.com), a French company which does a lot of Cocoon stuff. There, he has the opportunity to work around 50% on FOP sponsored by the same customer that enabled me to invest so much time into FOP over the last two years. I'd like to propose Vincent as an XML Graphics committer (FOP working set). I believe he will be a good addition to our work force. Jeremias Maerki - Apache XML Graphics Project URL: http://xmlgraphics.apache.org/ To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
FOrayFont integration in question
Hi all, Sorry for the long post, but I think this is an important one. I would like to have your feelings about the FOrayFont integration. Since I started to work on that (in July 2005), things have quite evolved and I'm starting to doubt that integrating FOrayFont really is a good thing for Fop. I've already discussed with some of you about this whole issue, but I think it might be worth summarizing the points, and making everyone aware of it. Because I've the feeling that whatever decision we make, this will be a difficult one. First, some progress informations about the integration: the PDF renderer works now with FOrayFont, and seems to run well. The other renderers are still to be adapted. There shouldn't be too much work for the Java2D-based ones, a bit more for the Postscript Renderer, and also for PCL and AFP (I can't evaluate how much there is to do for those ones as I know nothing about those formats). I estimate to about 5 days the amount of work to have a compilable thing. There should be no loss of feature; there is a known problem with the Postscript renderer (no way to know which fonts are used for a given page, so we have to embed all of the configured fonts in the header), but Jeremias is working on a two-pass system thanks to which this problem should be solved soon. For those who are not familiar with the FOrayFont architecture, here's a quick presentation: there is a separate project called aXSL [1] (also maintained by Victor) whose purpose is to define a standard API for several modules related to XSL-FO. The one we're interested in is aXSL-font, but there are also modules for dealing with graphics, manipulating the FO tree, the area tree, etc. The goal is to have standard interfaces shared by XSL-FO implementations. Provided that, of course, there are more than one implementation which implement aXSL. So FOrayFont is a particular implementation of aXSL-font. If Fop were using FOrayFont there would actually be almost only aXSL calls in the code. [1] http://www.axsl.org/ Now, let me enumerate the pros and cons of the adaptation of FOrayFont to Fop: Cons: - After Bertrand's recent work on OTF support the existing font library is not far from being as feature-complete as FOrayFont: - ToUnicode support is now available; - it seems easy to remove the XML metrics generation step (actually Jeremias told me he had already done it on his working copy) - the old font support would have to be kept for use by Batik (PS PDF transcoders) as the Batik people have strong feelings against external dependencies - FOrayFont introduces a new font-config file which would disturb users (although I think it is better and more flexible than the current one) - FOrayFont is mainly a one-man-show and it's not very good for Fop to have such a dependency. And as this is primarily Victor's baby we can't just come in and ask for write access to the code or whatever. We must first show that our point of view is adequate to Victor's one. - However, it seems like we have difficulties understanding each other: each time I propose a change on the dev list, that triggers a lengthy discussion where we both try to explain our own point of vue and understand the other one, without even finally succeeding I think. There is whatever cultural gap + foreign language issue that hinders communication. - As a consequence, proposing changes on the aXSL/FOray area to better suit our needs will require twice as much time and energy as doing them on our own side. - And given that the API isn't perfect yet, I'm a bit afraid of going that route. One missing major feature for example is the ability to cache informations about fonts and retrieve them later; this is necessary for the XML area tree output or the CachedRenderPagesModel. There is simply no means in the API to get a font's identifier, in order to retrieve it later without having to re-launch the whole resolution process. - during the past year, growing technical disagreements have appeared; if we keep working together we might end up with having a thing that satisfies neither of us, because of the too many compromises we would have to do. That ranges from programming practices to API design decisions. - As far as I know, FOray has never been used in production yet, and it may be unstable. There are currently not many testcases and, well, it's already not very funny to write testcases for one's own code, if I have to write testcases for others... Now for the pros: - This would be unfortunate to break the last bridge between Victor and Fop. - I've myself already done quite a bit of work on FOrayFont, which would be basically lost. - Despite existing problems Victor brought quite a number of improvements to the font library, which would have to be re-done. And he started from the 0.20.5 code, like we would if we were to go our own way (tell me if I'm wrong, but I don't think the font code changed
Re: svn commit: r474387 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fonts/ src/java/org/apache/fop/fonts/truetype/ src/java/org/apache/fop/fonts/type1/
Jeremias Maerki a écrit : Another of my travel projects checked in: I wanted to know how easy it is to load fonts without the XML metrics file. As you can see from the amount of code, it was rather easy. Makes me wonder why we didn't do it earlier. :-) Nice work, Jeremias. Well, definitely an additional point for FOPFont. Please note that the existing functionality is still fully there. The only change is that you can simply now omit the metrics-url attribute in the font configuration and the font will still load. What you lose in this case is the ability to manually tweak the XML metrics file or to specify WinAnsi encoding for TrueType fonts. Furthermore, you currently don't have control over TrueType Collections. The WinAnsi feature should not be necessary anymore now that we have ToUnicode CMaps. The Collection feature is easily added again through an additional attribute in the font configuration. ... and I don't see why one would need to tweak the metrics of a font? I'd be grateful if anyone could test what happens if you load a Type 1 fonts on a Unix where the PFM file has an uppercase extension, e.g. FUTURA.PFM. I have to construct the URI to the PFM manually from the PFB and use a lowercase extension (.pfm). Maybe we have to improve that to check with upper- and lowercase to account for case sensitive file systems. This fails. IMO the safest solution is to require that the name of the pfm file is specified in the config file. All the more so it should also be possible to give an afm file instead of a pfm one (BTW, Fop can't read afm files, can it? Because FOrayFont can...). If this is ok for you I'll implement these changes. Oh, or perhaps wait that a final decision about FOrayFont is made. Vincent
Status of the collapsing border model
Hi all, Just to let you know that I'd like to finish the implementation of the collapsing border model. I've started to look at the wiki pages, the code and the mail archives but if you have any hint about what are the remaining problems to solve, where to look at in particular, etc., I'm all ears ;-) Thanks, Vincent
Questions regarding the table layout code
Hi guys, As you may have noticed I have started to work on the table layout code. For now I have just made some small improvements, mainly added javadoc comments and renamed variables into names I believe are more explicit. Please tell me if there are changes or javadoc comments you don't agree with. Now I'd have a first bunch of questions for those who are familiar with that part of the code. Here they are: - in TableContentLayoutManager.getKnuthElementsForRowIterator: when a new row-group is fetched, its possible break-before seems to be taken into account only if the current Knuth element list ends with a penalty item. I suspect this is a bug, but would like to have confirmation. - in TableRowIterator.java: - when the end of a table-part (table-header, -footer, -body) is reached and there are pending spans, a new EffRow is created to contain the remaining spans. Is that really desirable? When there are no explicit table-row elements in the table I agree with that behavior. But when table-rows are explicit and a cell in the last row must span over several rows, I would bet this is an error in the input FO file (the 1.1 recommendation states that spans over several table parts are forbidden). Wouldn't it be better to raise an error in such a case? - if there are several table-bodies, LAST_IN_PART will be set only on the last row of the last table-body. Is that behavior really intended? I would say no as AFAIU this flag is used for border resolution - IIUC, a PrimaryGridUnit is meant to be the before-start (top-left) grid unit of a spanned cell, while GridUnit is for the other grid units (and thus only appear in spanned cells). Is that a design choice, or just a side-effect? I'd like to add an explanation of that in the javadoc of PrimaryGridUnit. - I seem to have seen somewhere that normalizing tables when building the FO tree would ease life; that is, table-row FO objects would be created for tables which don't contain any (and rely on the starts-row/ends-row properties). Apparently that's not done in the current code. Just to know, is it due to a lack of time, or a design decision? With my current understanding of the whole issue it seems to me that that would indeed ease things at the layout stage, but I may have missed something. - it seems that the getNextKnuthElements method is meant to return each time a forced break is encountered. Is that a design requirement? Can I add that point to the method's javadoc? That's it for now. But get ready for a second whole bunch later ;-) Thanks, Vincent
Re: Problems with display-align
Chris Bowditch a écrit : Vincent Hennebert wrote: Hi Bradley, Bradley Harrington a écrit : Hello, I don't know if this is a known problem or not, however the display-align attribute always puts the text at the top of the cell for me. I have tried this with both trunk and 0.92 beta. It does work with other XSL parsers I've tried. Here is the sample code: It's not implemented. And because of the Knuth approach it doesn't seem entirely trivial to implement it. Good news is that I'm currently working on tables, so I may find a way to do it. We'll have to eventually, anyway. Stay tuned. Strange. I thought this worked. I just ran a simple test and it appears to work for me. PDF attached. Hmmm, by looking at the code I had the feeling it wasn't implemented, but I may well be wrong. I'll have another look. More later. Vincent
Re: FOrayFont integration in question
Hi All, So, not many opinions on this it seems. Thanks to Bertrand and Jeremias for their comments. I'll need to have a closer look at the current font library. As I was supposed to replace it with FOrayFont I have never studied it in detail yet. Then I'll see if it is best to keep it or to switch to a fork of FOrayFont. Although right now I've the feeling the former solution is preferable. My first two goals are to polish the removal of the XML metrics generation step (mainly add an optional parameter in the config file for specifying the name of the PFM file), and add support for AFM metrics files. Then... we'll see. Cheers, Vincent
Re: DO NOT REPLY [Bug 41019] - Left-align oddness with long, unbreakable strings following
Hi Luca, Luca Furini a écrit : snip/ 1) TextLM breaks the text even when a / or a - is found, handling them as hyphenation points with the usual sequence of glue + penalty + glue elements. The LineLM tries, in the first instance, to avoid using hyphenation points, so the penalty is not taken into account. But this has the side effect of using the first glue element as a feasible break (if the penalty were a feasible break too, it would surely be a better one, such avoiding the glue to be effectively chosen). I don't follow you: IIUC the glue-penalty-glue triplet is generated only the second time, when the first breaking doesn't give acceptable results? What do you mean by the penalty is not taken into account? Also, I don't see why the penalty would be preferred over the glue, as it has a positive penalty value. This is probably the smaller of the problems, and can be solved just adding an infinite penalty before the first glue element. But maybe we This seems to be a good idea, anyway. want to prevent this breaking to happen, as we can now use zero-width-spaces to explicitly insert breaking positions? Good point. I'd say yes for '/'. This would add a burden to the user who would have to modify the FO generation step to add ZWSP for URLs or filenames; but we must also take into account cases where the user does /not/ want the word to be split at '/' characters. For hyphens, I would keep the current behavior, as this is the most expected one IMO. And it can also be prevented by adding non-breaking zero-width space. 2) The presence of an inline object larger that the available width makes the algorithm to deactivate all the active nodes and then restart with a second-hand node, as no line can be built that does not overflow. The restarting node was chosen, in BreakingAlgorithm.findBreakingPoints(), between lastTooShort and lastTooLong, neither of them being a good breaking point. There is a lastDeactivated node chosen among the deactivated nodes but it was not used. A deactivated node previously was an active one, so it is surely better than a node who failed to qualify; replacing either lastTooShort or lastTooLong (according to the adjustment) with lastDeactivated leads to a better set of breaks. However, this in not enough. The attached file small.20.pdf shows the result after fixing these first two problems. 3) At the moment, the LineLM can call findBreakingPoints() up to three times, the last one with a maximum adjusting ratio equal to 20. I came to the conclusion that this is really TOO much. I tried stopping after the second call (with max ratio = 5) and the result is much better (see attached file small.5.pdf). Yes 20 is probably too much. We need perhaps to also differentiate the case where no acceptable line-breaking can be found because of a box too long to even fit alone on one line. In such a case even a very high max ratio won't help. A high maximum adjustment ratio means that the algorithm is allowed to stretch spaces a lot in order to find a set of breaks which is *globally* better; this means that it can choose some not-so-beautiful breaks in order to build a set spanning over a larger portion of the paragraph. In our example: there can be a break just before the long url (a line ending after Consider:) only if we use an enormous adjustment ratio. With a smaller, more appropriate threshold, Consider: can no more end a line, so the algorithm will restart from a previous point. In conclusion: the first two items are easily fixed, and I'm going to commit the changes in the afternoon (in there are no objections); concerning the question of the automatic break at /- characters, I'll probably leave the code unchaged for the moment, until we decide what is best. Concerning point #3, I'm going to have a closer look at the restarting mechanism ... Yes, the current mechanism doesn't seem to be good enough, but I'm wondering if we can find a better one. Currently a too-short/too-long node replaces another one if it has fewer demerits. The number of lines/pages handled so far isn't taken into account. So this is likely that a too-short/too-long node ending an earlier line/page will be preferred over a node going further in the Knuth sequence. Why should that be the case? In fact the main problem I think is to find the right heuristic to select too-short/too-long nodes, in order to end up with the most acceptable result. Easy to say... Also, may I suggest you to look at the Temp_Floats branch, and perhaps even working on it instead of trunk? I've made quite heavy changes to the breaking code that might be difficult to merge back into the trunk if there are also changes there. Cheers, Vincent
Re: DO NOT REPLY [Bug 41019] - Left-align oddness with long, unbreakable strings following
Hi guys, J.Pietschmann a écrit : Simon Pepping wrote: Would this be a good moment to make these features of the breaking algorithm user configurable, like they are in TeX? This allows people to play with the various possibilities without having to modify the code. This can be combined with parameters for configuring the handling of before-floats. We might want to have a coherent set of parameters here. I was thinking about creating extension parameters in the fox: namespace. As those are things that have to be independently set for each FO file IMO, rather than having them in the Fop config file. I'll try to work on that soon. Probably, if this can be combined with implementing UAX14. This may be time to look at Simon's generalized Knuth elements for linebreaking. I wanted to but haven't had the time yet, and I'm still missing some knowledge regarding UAX14. Damn, so many things to do in so little time. Not speaking of releasing 0.93... Vincent