Re: Font variant SmallCaps Was: Re: (Chris) Re: Traits
On 26.11.2003 21:32:30 J.Pietschmann wrote: Victor Mote wrote: Yes, this can get ugly. If anybody knows of a way to find the physical font file from an awt Font object, please speak up. Currently (as of 1.4.1 you can create a awt.Font from an InputStream, but you cant get back whatever physical representation the font has from the awt.Font object. Javadoc tells me this works since 1.3 (Font.createFont method). Anyway, it's the opposite of what Victor wanted. I thought of the following approach interface fop.font.Font { public InputStream getFontInputStream(); public awt.Font getAWTFont(); // duplicate AWT methods, with certain stuff like metrics // replaced with FOP objects } class fop.font.AWTFont implements fop.font.Font { // delegate to AWT font // encapsulate awt.FontMetric in FOP font metric public InputStream getFontInputStream() { return null; } } class fop.font.Type1Font implements fop.font.Font { public awt.Font getAWTFont() { return null;} // use FOP type 1 font reader } class fop.font.TrueTypeFont implements fop.font.Font { public awt.Font getAWTFont() { return new Font(new FileInputStream...);} I think you meant return Font.createFont(Font.TRUETYPE_FONT, new FileInputStream... // use FOP TTF reader or delegate to the AWT font. } class fop.font.PDFBuiltinFont implements fop.font.Font { public InputStream getFontInputStream() { return null; } public awt.Font getAWTFont() { return null;} // return generated classes for metrics etc. } This means users can use AWT fonts for creating PDF, but they can't embed them. This may cause the resulting PDF to fail, but so what. -- Support questions And there's still the question if we can produce font metric information for the target formats (there's PCL and PostScript and..., too) that result in the desired output. We *could* try to use the TTF reader to search through the fonts in the Windows font directory (or XFonts) in order to find the file for an AWT font. Yeah, but it will be so slow. You'd need some persistent cache to overcome that. Seriously, I don't think working with AWT's Font will do us any good. The differences between JDKs are too great. We need java.awt.Font objects for the Java2D-related renderers, but producing/getting these objects is the renderer's job, just as it's the PDFRenderer's job to produce PDF font objects for serialization. IMO it's better to have full control over what happens. Jeremias Maerki
Re: [VOTE] Properties API
J.Pietschmann wrote: Victor Mote wrote: This is a good question. The answer to the first part is that it should return an int, representing the number of millipoints. When it cannot be resolved, it should return an int constant TBD_LAYOUT (or whatever), which is equal to -32,987 (or whatever). So, the Area Tree or Layout needs to then perhaps query another get method to determine how it should compute the value from its Area Tree context, or, as I mentioned in a recent (within the past hour or so) posting in response to Glen, either 1) passing context data to get, or 2) making get look at the area tree context before returning the value. What about font-size=12pt+2%+0.8*from-parent(height div 32) ? Good question. Make it font-size=12pt+2%+0.8*(from-parent(height) div 32) though. Even nastier is font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) because in markers and in static-content, we have to keep track of where *all* property specifications occur in the ancestry FO tree to resolve it. In general, the functions will be resolvable as the FO tree is built. The tree is static, in the sense that the tree relationships are maintained in spite of any to-ing and fro-ing with the Area Tree. Markers are an exception, and because marker properties are resolved in the context of the static-content into which they are eventually placed, all the information required for from-nearest-specified() must be available in the static-content FO subtrees. Because this is not required in the fo:flows, a good deal of property storage efficiency is realizable. This is why I was talking some time ago about a PropertyValue type which is an RPN-style expression, which can be rapidly resolved without recourse to the general parser. Without it, we have to carry at least some expressions around in the raw, after having first parsed them in order to determine that we can't resolve them, and then throw them to the parser again whenever a) we have sufficient context, or b) the page is re-laid. The idea of performing a full parse on a given expression more than once makes me nauseous. The approach I am thinking about with such expressions is to associate the expression, and therefore the FO node, with the *area* which will provide the context for the resolution of the percentage component(s) of the expression. (It may be enough to use the parent area of the areas that the node generates, and to work back to the appropriate reference area or other dimension when the parent dimensions are resolved.) When the dimensions of such an area are resolved, the list of attached FO property expressions can also be resolved. Exactly how to do this I am not yet sure, but I am thinking of something along the lines of a co-routine, implemented by message-passing queues, similar to the existing structure of the parser/FOtree-builder interaction. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Font variant SmallCaps Was: Re: (Chris) Re: Traits
Jeremias Maerki wrote: Anyway, it's the opposite of what Victor wanted. Yeah. I think you meant return Font.createFont(Font.TRUETYPE_FONT, new FileInputStream... OOps, yes. This means users can use AWT fonts for creating PDF, but they can't embed them. This may cause the resulting PDF to fail, but so what. -- Support questions It depends. If users are still required to declare used fonts explicitely as well as whether they should be embedded, and FOP bails out if told to embed an AWT font, it shouldn't be much of a problem. And there's still the question if we can produce font metric information for the target formats (there's PCL and PostScript and..., too) that result in the desired output. The idea was to query the renderer for fonts, or get a renderer specific font manager, and use the abstract interface to get mainly character metrics, various other font measures and perhaps font attributes like sans-serif or so. Seriously, I don't think working with AWT's Font will do us any good. The differences between JDKs are too great. Hmhm. I still think - We must be able to use AWT fonts in the AWT renderer. - We must be able to use the default PDF fonts for the PDF renderer - We should be able to use TTF for both AWT and PDF - We should not rely on the default PDF fonts in the AWT renderer (doh!) - I'd like to support using TTF with generated and possibly hand-corrected metrics as well as using TTF directly. The possiblity to use AWT fonts for non-embedded fonts in PDF is a bonus. The possiblity to use AWT fonts in PDF even for embedding would be just another bonus. J.Pietschmann
Re: Font variant SmallCaps
Peter B. West wrote: Although not mandated in XSL-FO, CSS2 offers a number of methods of font matching, only some of which preserve metrics. The FO User Agent is free to make implementation-specific decisions about this, I assume. My main interest here is in whether we want to try to separate out the font handling so that we try to guarantee identical layout on any renderer, or whether we state up front that such universality is *not* on offer. I don't think the TXT renderer will in general render to the same layout as others :-) Nitpicks aside, fonts may be renderer specific. If different renderers use an identical font (e.g. a user configured TTF) chances are that the layout is the same, provided bitmap images are rendered identically. If different renderers use different fonts, which may have different metrics even if they are the same family, the layout is likely to be different too. I can't see how to avoid this. BTW fonts aren't the only considerations, others are color and the discretization of coordinates (e.g. bitmap vs. vector format). I would assume that the most useful response to the above situation is to issue a warning and do one's best to match the font. The user has access to a number of mechanisms for narrowing font choice, and in the worst case we use a fall-back font. I'd say if the user saye font-family=futura, sans-serif,any he'll get a warning that a fallback was used in case there is no futura or not even a sans-serif font. If the user says font-family=futura and there is no futura font, FOP should terminate. After all, the user hopefully thought about it. J.Pietschmann
Re: Font variant SmallCaps Was: Re: (Chris) Re: Traits
On 27.11.2003 17:30:18 J.Pietschmann wrote: Jeremias Maerki wrote: snip/ This means users can use AWT fonts for creating PDF, but they can't embed them. This may cause the resulting PDF to fail, but so what. -- Support questions It depends. If users are still required to declare used fonts explicitely as well as whether they should be embedded, and FOP bails out if told to embed an AWT font, it shouldn't be much of a problem. Font configuration should/will become easier. For TrueType and Type1 fonts this should just be a matter of specifying a list of directories in which to look for fonts. A cache is needed to speed up the inventory on startup. And there's still the question if we can produce font metric information for the target formats (there's PCL and PostScript and..., too) that result in the desired output. The idea was to query the renderer for fonts, or get a renderer specific font manager, and use the abstract interface to get mainly character metrics, various other font measures and perhaps font attributes like sans-serif or so. My idea is still different: Having several font sources and the renderers merely announce which font sources they support. That leaves to option open for later to enable multiple renderers simultaneously. Example: FontSource A: TrueType fonts FontSource B: Type 1 fonts FontSource C: AWT fonts FontSource D: Base14 fonts FontSource E: PCL base fonts (just a guess) PDF renderer supports: A, B, D (maybe C) PostScript renderer supports: A, B, D (maybe C) Java2D/AWT renderer supports: A and C PCL renderer: Probably A, E . Seriously, I don't think working with AWT's Font will do us any good. The differences between JDKs are too great. Hmhm. I still think - We must be able to use AWT fonts in the AWT renderer. - We must be able to use the default PDF fonts for the PDF renderer - We should be able to use TTF for both AWT and PDF - We should not rely on the default PDF fonts in the AWT renderer (doh!) - I'd like to support using TTF with generated and possibly hand-corrected metrics as well as using TTF directly. The possiblity to use AWT fonts for non-embedded fonts in PDF is a bonus. The possiblity to use AWT fonts in PDF even for embedding would be just another bonus. +1 to all 7 points. Should be doable with my aproach. Jeremias Maerki
Re: Font variant SmallCaps
J.Pietschmann wrote: Peter B. West wrote: Although not mandated in XSL-FO, CSS2 offers a number of methods of font matching, only some of which preserve metrics. The FO User Agent is free to make implementation-specific decisions about this, I assume. My main interest here is in whether we want to try to separate out the font handling so that we try to guarantee identical layout on any renderer, or whether we state up front that such universality is *not* on offer. I don't think the TXT renderer will in general render to the same layout as others :-) Nitpicks aside, fonts may be renderer specific. If different renderers use an identical font (e.g. a user configured TTF) chances are that the layout is the same, provided bitmap images are rendered identically. If different renderers use different fonts, which may have different metrics even if they are the same family, the layout is likely to be different too. I can't see how to avoid this. This was my original perception. (I hadn't even thought about the different rendering of images.) BTW fonts aren't the only considerations, others are color and the discretization of coordinates (e.g. bitmap vs. vector format). All of which leads me to the question of what, exactly, we are trying to isolate and amalgamate in the font system. We can't get away from renderer dependencies, so the target renderer is going to have to be accommodated before any atomic elements are introduced to the Area Tree. As I said, I haven't been closely following the fonts discussion, but I'm confused as to where it fits in the scheme of things, and what it is trying to achieve. ... I'd say if the user saye font-family=futura, sans-serif,any he'll get a warning that a fallback was used in case there is no futura or not even a sans-serif font. If the user says font-family=futura and there is no futura font, FOP should terminate. After all, the user hopefully thought about it. Makes sense. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: [VOTE] Properties API
Peter B. West wrote: Good question. Make it font-size=12pt+2%+0.8*(from-parent(height) div 32) though. O well, the perils of staying awake late... This is why I was talking some time ago about a PropertyValue type which is an RPN-style expression, which can be rapidly resolved without recourse to the general parser. I'm not quite sure what the intend of this sentence is. The maintenance branch uses a recursive descending parser to parse the property expressions as defined in the spec in section 5.9 (complications due to shorthand lists and font lists aside). The parser generates a tree of property expression objects. If the property value is inquired, the expression is evaluated, and the various subexpressions pull necessary context values from their associated FOs. I vaguely remember PropertyList and PropertyManager work somewhat differently, and this is for example the reason why percentages still don't work for table columns and why it is not possible to define a leader length as 2cm+30% (you can have either 2cm or 30%), despite the expression is parsed correctly. Note that while expressions may be *evaluated* repeatedly, they are *parsed* exactly once. Without it, we have to carry at least some expressions around in the raw, after having first parsed them in order to determine that we can't resolve them, and then throw them to the parser again whenever a) we have sufficient context, or b) the page is re-laid. We don't have to carry the expression as string and reparse every time, parsing into an expression tree and evaluating as necessary works just fine. A possible concern could be memory waste, for example if people write start-indent=2cm + 0.8*( 10.4cm div 2) this would create a tree sum + length 2cm + mul + number 0.8 + div + length 10.4cm + number 2 or 7 objects. Compilers use constant folding, i.e. using arithmetic laws for rearranging and possibly combining terms in the expression. A valid length expression would be canonicalized into a sum of an absolute length measure, a percentage and unresolvable functions. Whether building a constant folding mechanism is worthwile is quite another matter. I didn't see complicated expressions all that often, and optimizing the parsed tree may as well cost more time than is saved later. Anyway, the folding mechanism will detect a lot of invalid expressions early during parsing, which may be an advantage. The idea of performing a full parse on a given expression more than once makes me nauseous. Just don't do it. The approach I am thinking about with such expressions is to associate the expression, and therefore the FO node, with the *area* which will provide the context for the resolution of the percentage component(s) of the expression. (It may be enough to use the parent area of the areas that the node generates, and to work back to the appropriate reference area or other dimension when the parent dimensions are resolved.) When the dimensions of such an area are resolved, the list of attached FO property expressions can also be resolved. Exactly how to do this I am not yet sure, but I am thinking of something along the lines of a co-routine, implemented by message-passing queues, similar to the existing structure of the parser/FOtree-builder interaction. Sorry, I think this is overcomplicated. J.Pietschmann
RE: [VOTE] Properties API
Peter B. West wrote: What about font-size=12pt+2%+0.8*from-parent(height div 32) ? Good question. Make it font-size=12pt+2%+0.8*(from-parent(height) div 32) though. Even nastier is font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) because in markers and in static-content, we have to keep track of where *all* property specifications occur in the ancestry FO tree to resolve it. In general, the functions will be resolvable as the FO tree is built. The tree is static, in the sense that the tree relationships Correct. Neither of the examples given has anything to do with the interface proposed, because all of the computations are done on the FOTree side of the house. are maintained in spite of any to-ing and fro-ing with the Area Tree. Markers are an exception, and because marker properties are resolved in the context of the static-content into which they are eventually placed, all the information required for from-nearest-specified() must be available in the static-content FO subtrees. Yes, this is the real issue. Since an fo:marker's content can be used more than one place, this requires that its contents be grafted into the tree where needed. I think the only trick here is to pass the static content context back to the get method so that it knows how to get the information it needs. Sec 6.11.4 says that fo:retrieve-marker is (conceptually) replaced by the children of the fo:marker that it retrieves. The most general way that I can think of to implement this is to force the passage of a parent fop.fo.flow.RetrieveMarker in the get method's signature. This tells the get method: One of your ancestors is an fo:marker object, and, for purposes of this get, consider that ancestor grafted into the tree at this fo:retrieve-marker's location. Of course, if there is no ancestor fo:marker, pass a null. Now, this raises another issue. FONode has a getParent() method. This method may need to be expanded to include this concept. Any child could then ask for its parent either with null (go up the tree through fo:marker, i.e. the way the input specifies, and the way it works now), or with a grafting point specified, so that if a grafting point is specified, it will go up the tree in that direction instead. In fact, it may be good to create a GraftingPoint interface that RetrieveMarker implements, in case there are additional similar items now or in the future. class Marker { ... getParent(GraftingPoint gp) { if (gp == null) { return this.parent; } return gp.getParent(null); } ... } So, lets use: font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) as an example. Lets assume an FOTree fragment that looks like this: fo:marker fo:block fo:inline For both the block and the inline, the get will need to research its ancestry to resolve the expression. If we pass the grafting point to the get, and the get directly or indirectly uses the getParent(GraftingPoint gp) method to find that ancestry, it seems to me that everybody has everything they need. The key insight for me here is that *none* of this is actually dependent on the Area Tree at all, that what we are really doing is grafting. I had originally thought that some Area Tree information would need to be passed in, but I really think the above is much more elegant, and more clearly follows the concepts that are in play. Of cource, I rely on the rest of you guys to tell me if I have missed something (a real possibility). Because this is not required in the fo:flows, a good deal of property storage efficiency is realizable. This is why I was talking some time ago about a PropertyValue type which is an RPN-style expression, which can be rapidly resolved without recourse to the general parser. Without it, we have to carry at least some expressions around in the raw, after having first parsed them in order to determine that we can't resolve them, and then throw them to the parser again whenever a) we have sufficient context, or b) the page is re-laid. The idea of performing a full parse on a given expression more than once makes me nauseous. Again, this is an implementation detail, and doesn't affect the interface. However, on the implementation side, it seems that the tradeoff will be between doing a full parse each time, or creating lots of objects. John Austin's inquiry about the huge number of objects created is what got me started down this line of thinking. I suppose that the best way would be to have your cake and eat it too -- store integers where possible, and create objects where not possible, and teach everything how to tell the difference. (Here is a half-baked idea that I don't want to even think about pursuing for a while -- PropertyStrategy. With the API I have proposed, one could conceivably store the Properties one of several ways, and have the user select which one they want based on performance needs). The approach I am thinking about
Re: [VOTE] Properties API
J.Pietschmann wrote: Peter B. West wrote: This is why I was talking some time ago about a PropertyValue type which is an RPN-style expression, which can be rapidly resolved without recourse to the general parser. I'm not quite sure what the intend of this sentence is. The maintenance branch uses a recursive descending parser to parse the property expressions as defined in the spec in section 5.9 (complications due to shorthand lists and font lists aside). The parser generates a tree of property expression objects. If the property value is inquired, the expression is evaluated, and the various subexpressions pull necessary context values from their associated FOs. I vaguely remember PropertyList and PropertyManager work somewhat differently, and this is for example the reason why percentages still don't work for table columns and why it is not possible to define a leader length as 2cm+30% (you can have either 2cm or 30%), despite the expression is parsed correctly. Note that while expressions may be *evaluated* repeatedly, they are *parsed* exactly once. Without it, we have to carry at least some expressions around in the raw, after having first parsed them in order to determine that we can't resolve them, and then throw them to the parser again whenever a) we have sufficient context, or b) the page is re-laid. We don't have to carry the expression as string and reparse every time, parsing into an expression tree and evaluating as necessary works just fine. A possible concern could be memory waste, for example if people write start-indent=2cm + 0.8*( 10.4cm div 2) this would create a tree sum + length 2cm + mul + number 0.8 + div + length 10.4cm + number 2 or 7 objects. I missed the import of this when I built the alt.design parser on top of the maintenance branch parser code. This is what I am trying to achieve. If it already exists, so much the better. Where does repeated evaluation of the parse tree occur? Is there a parse tree object? Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
RE: Font variant SmallCaps Was: Re: (Chris) Re: Traits
Jeremias Maerki wrote: And there's still the question if we can produce font metric information for the target formats (there's PCL and PostScript and..., too) that result in the desired output. The idea was to query the renderer for fonts, or get a renderer specific font manager, and use the abstract interface to get mainly character metrics, various other font measures and perhaps font attributes like sans-serif or so. My idea is still different: Having several font sources and the renderers merely announce which font sources they support. That leaves to option open for later to enable multiple renderers simultaneously. Yes, this latter idea is closer to my view as well. Victor Mote
RE: Font variant SmallCaps
Peter B. West wrote: BTW fonts aren't the only considerations, others are color and the discretization of coordinates (e.g. bitmap vs. vector format). All of which leads me to the question of what, exactly, we are trying to isolate and amalgamate in the font system. We can't get away from renderer dependencies, so the target renderer is going to have to be accommodated before any atomic elements are introduced to the Area Tree. As I said, I haven't been closely following the fonts discussion, but I'm confused as to where it fits in the scheme of things, and what it is trying to achieve. I think what we are trying to do is to separate Font details from the Layout and Render processes as much as possible. Renderer dependencies need to drive font selection, but by the time the Layout or Render actually start to do their work, those issues should already be resolved. Your inquiries here have actually caused me to rethink Fonts a bit. My view of FOP's overall design (I apologize for brining this up again, but it is necessary to explain the point) is that we have a Session (Driver) which can have multiple Documents, each of which can have multiple RenderContexts, each of which can have multiple Renderers. (The RenderContext class exists only in my mind right now, as I have failed to get support for it -- however, much of the discussion in this thread revolves around that concept). Document controls the FOTree build, RenderContext controls the Layout/Area Tree, and the Renderers simply render the Area Tree. Now, my plan has been to have FOTree resolve to a Font object during parsing. However, it is clear to me that actual Font resolution has to go with the RenderContext concept. So, probably the FOTree needs to store more raw information (perhaps in an FOFont object) that the RenderContext can resolve into a Font during layout. The performance impact should be minimal, but I think the extra layer of abstraction is important. Victor Mote
RE: [VOTE] Properties API
On Thu, 2003-11-27 at 13:58, Victor Mote wrote: ... Again, this is an implementation detail, and doesn't affect the interface. However, on the implementation side, it seems that the tradeoff will be between doing a full parse each time, or creating lots of objects. John Austin's inquiry about the huge number of objects created is what got me started down this line of thinking. I am critical of what I percieve to be a pathological growth of objects (and search times). If those problems are corrected, there are plenty of resources left to do a few extra parses. How often will you encounter expressions this complex ? Rarely. If they become common (and someone will do that!), we can call THAT a pathalogical development and smirkblame the victim/smirk. I suppose that the best way would be to have your cake and eat it too -- store integers where possible, and create objects where not possible, and teach everything how to tell the difference. (Here is a half-baked idea that I don't want to even think about pursuing for a while -- PropertyStrategy. With the API I have proposed, one could conceivably store the Properties one of several ways, and have the user select which one they want based on performance needs). As Peter knows, I have been reading the code. I shall attempt the XSL-FO Spec soon. I understand the spec defines the behavior of the program in terms of fully parsed/expanded trees. This implies that objects must exist even if they will never be used after the parser moves past their end-points. Optimization anyone? What I infer of the Tree structures in your discussion and Peter's code suggests to me that FOP creates a DOM-ish view of the document in one or more trees. This is a mis-match with the SAX parser that is in there somewhere. And just to say something completely ludicrous, because someone will take it seriously ... You could convert those expressions to a Java class, compile, load and invoke it with Reflection ... -- John Austin [EMAIL PROTECTED]
RE: [VOTE] Properties API
Glen Mazza wrote: Sent: Wednesday, November 26, 2003 3:17 PM To: [EMAIL PROTECTED] Subject: RE: [VOTE] Properties API --- Victor Mote [EMAIL PROTECTED] wrote: The current implementation might look like this: public class FObj { public int getMaxWidth() { //WARNING -- unchecked or tested!! return properties.get(max-width).getLength().getValue(); } A subclass that doesn't use max-width might override with: public int getMaxWidth() { return FObj.INVALID_PROPERTY; } Not to be a pain here--but this just occurred to me--if we are going to do this, it may be cleaner to rely on enumeration constants--that way, we probably can avoid all these return FObj.INVALID_PROPERTY overrides in the various FO's for each unsupported property. This is what I'm thinking (pseudocode here): In FOObj (similar to your code above): public int getProperty(int propVal) { if (validateValidProperty(propVal) == false) { return FObj.INVALID_PROPERTY; } return properties.get(max-width).getLength().getValue(); } Then we may just need a single validateValidProperty() method in each FO, that would check if propVal is an accepted property for that FO. Each FO's validateValidProperty() would not need to degenerate into a huge list of comparisons like this: if (prop_val == PROPVAL1 || prop_val == PROPVAL2 || ... on and on and on) return true; else return false; because I think we can simplify the properties supported by an FO into an integer array of 1's (supported) and 0's (not) so the validate() function for any FO would look like this: validateValidProperty(int propVal) { return (supportedProps[propVal] == 1); } (Come to think of it, we can probably keep validateValidProperty() in the FObj base class alone as well!) IOW (I assume) use a 2-dimensional array, the first dimenension representing the Object, the second dimension representing the list of possible Properties. Then, finally, we can perhaps expand this array of 1's and 0's to include a 2--supported by the spec but not yet by FOP, i.e., FObj.NOT_YET_SUPPORTED, and other codes as needed. Comments? This would again be one of the implementation details that I am trying to hide, but yes, it makes sense to me, at least as one of several possibilities. Victor Mote
RE: [VOTE] Properties API
John Austin wrote: I am critical of what I percieve to be a pathological growth of objects (and search times). If those problems are corrected, there are plenty of resources left to do a few extra parses. How often will you encounter expressions this complex ? Rarely. If they become common (and someone will do that!), we can call THAT a pathalogical development and smirkblame the victim/smirk. I tend to agree with all of this, at least in terms of which end of the spectrum we should favor. I suppose that the best way would be to have your cake and eat it too -- store integers where possible, and create objects where not possible, and teach everything how to tell the difference. (Here is a half-baked idea that I don't want to even think about pursuing for a while -- PropertyStrategy. With the API I have proposed, one could conceivably store the Properties one of several ways, and have the user select which one they want based on performance needs). As Peter knows, I have been reading the code. I shall attempt the XSL-FO Spec soon. I understand the spec defines the behavior of the program in terms of fully parsed/expanded trees. This implies that objects must exist even if they will never be used after the parser moves past their end-points. Optimization anyone? This doesn't sound right to me. I think you may have misunderstood something, but you'll need to be more specific for me to tell. What I infer of the Tree structures in your discussion and Peter's code suggests to me that FOP creates a DOM-ish view of the document in one or more trees. This is a mis-match with the SAX parser that is in there somewhere. FOP's design on the SAX/DOM issue was a difficult issue for me to grasp, and when I did, I documented it here: http://xml.apache.org/fop/design/parsing.html#input There is no mismatch at all. The input we work with is a tree. Therefore a tree-like structure is absolutely necessary to represent it. I am pretty sure from past discussions with Peter that he employs a tree-like structure as part of this pull-parsing. So, if it is important for FOP to use a tree-like structure to represent its input, the only issue is whether to use DOM or some home-grown structure. Since a home-grown structure is much lighter and more flexible for our needs (AFAIK, adding business logic to a DOM would be impossible), the only question is what standard way should the home-grown structure be built. SAX provides a much lighter-weight way of building our home-grown structure than anything else that I have seen. Now, if you can figure out how to digest an FO document without building a tree that represents a page-sequence object, I hope you'll share it with the rest of us. That could be a breakthrough indeed. Victor Mote
Re: [VOTE] Properties API
Peter B. West wrote: Where does repeated evaluation of the parse tree occur? Is there a parse tree object? The necessary classes are somewhat distributed across the packages. Some necessary classes are in fop.datatypes, the common property superclasses are in fop.fo, with fop.fo.Property being the top of the hierarchy, there's somethiing in fop.fo.expr and the concrete implementations along with their makers (for parsing) are of course generated. The tree isn't build from classes deriving from a single class, like the FO tree. Instead, the fop.fo.Property class is used for both unevaluated and evaluated properties and for computing functions, while length expressions are built from Length subclasses (the only expressions of nontrivial complexity, everything else which is not a shorthand or text-decoration is either a single token or a function call). The relevant class is LinearCombinationLength. Unfortunately, I can't find the expression parser, but I recall I've seen it. The seemingly clever but ultimately misguided attempt to press all property handling including shorthands, font-family lists and text-decoration in a unified framework has lead to a number of kludges which may make reengineering the whole stuff a bit difficult. Regards J.Pietschmann
RE: [VOTE] Properties API
On Thu, 2003-11-27 at 14:57, Victor Mote wrote: John Austin wrote: I am critical Now, if you can figure out how to digest an FO document without building a tree that represents a page-sequence object, I hope you'll share it with the rest of us. That could be a breakthrough indeed. I am just thinking of ensuring that objects disappear after the page they are on has been printed. At the point that 0.20.5 prints: [INFO] [1] The related objects from Page 1, should ... join the choir invisibule ... They don't appear to, which is why the memory use of FOP increases in proportion to document length. You only need to retain the useful parts of the page-sequence object. Stuff that has been 'printed' isn't useful. Victor Mote -- John Austin [EMAIL PROTECTED]
Re: [VOTE] Properties API
John Austin wrote: I am critical of what I percieve to be a pathological growth of objects (and search times). If those problems are corrected, there are plenty of resources left to do a few extra parses. How often will you encounter expressions this complex ? Rarely. A complex expression tree *will* *not* have any influence on search times in the inheritance lattice of a property. Unless there are functions referencing other properties of course. And more offten than not, there will be exactly one object in the tree, representing the only token parsed from the property value, which is hardly a pathological growth of objects. This implies that objects must exist even if they will never be used after the parser moves past their end-points. Optimization anyone? Be careful, proper layout needs backtracking. J.Pietschmann
Re: [VOTE] Properties API
Victor Mote wrote: John Austin wrote: What I infer of the Tree structures in your discussion and Peter's code suggests to me that FOP creates a DOM-ish view of the document in one or more trees. This is a mis-match with the SAX parser that is in there somewhere. FOP's design on the SAX/DOM issue was a difficult issue for me to grasp, and when I did, I documented it here: http://xml.apache.org/fop/design/parsing.html#input There is no mismatch at all. The input we work with is a tree. Therefore a tree-like structure is absolutely necessary to represent it. I am pretty sure from past discussions with Peter that he employs a tree-like structure as part of this pull-parsing. So, if it is important for FOP to use a tree-like structure to represent its input, the only issue is whether to use DOM or some home-grown structure. Since a home-grown structure is much lighter and more flexible for our needs (AFAIK, adding business logic to a DOM would be impossible), the only question is what standard way should the home-grown structure be built. SAX provides a much lighter-weight way of building our home-grown structure than anything else that I have seen. Even the dreaded pull-parser uses SAX. Now, if you can figure out how to digest an FO document without building a tree that represents a page-sequence object, I hope you'll share it with the rest of us. That could be a breakthrough indeed. The problem is the same one that FOP has always struggled with - how to 1) discard subtrees in a timely manner, and 2) serialize subtrees for efficient caching and retrieval when they will be needed at some time in the future. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: [VOTE] Properties API
Victor Mote wrote: Peter B. West wrote: What about font-size=12pt+2%+0.8*from-parent(height div 32) ? Good question. Make it font-size=12pt+2%+0.8*(from-parent(height) div 32) though. Even nastier is font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) because in markers and in static-content, we have to keep track of where *all* property specifications occur in the ancestry FO tree to resolve it. In general, the functions will be resolvable as the FO tree is built. The tree is static, in the sense that the tree relationships Correct. Neither of the examples given has anything to do with the interface proposed, because all of the computations are done on the FOTree side of the house. 2% of what? Of a reference area. Of what actually gets laid out on a page. If a single flow object gets laid out over more than one page, that reference may vary, but nothing changes in the FO Tree. It makes o sense to second-guess the Area tree within the FO tree. It's within the Area tree that all of these floe objects begin to take on concrete dimensions. are maintained in spite of any to-ing and fro-ing with the Area Tree. Markers are an exception, and because marker properties are resolved in the context of the static-content into which they are eventually placed, all the information required for from-nearest-specified() must be available in the static-content FO subtrees. Yes, this is the real issue. Only one of the real issues, I'm afraid. Since an fo:marker's content can be used more than one place, this requires that its contents be grafted into the tree where needed. I think the only trick here is to pass the static content context back to the get method so that it knows how to get the information it needs. Sec 6.11.4 says that fo:retrieve-marker is (conceptually) replaced by the children of the fo:marker that it retrieves. The most general way that I can think of to implement this is to force the passage of a parent fop.fo.flow.RetrieveMarker in the get method's signature. This tells the get method: One of your ancestors is an fo:marker object, and, for purposes of this get, consider that ancestor grafted into the tree at this fo:retrieve-marker's location. Of course, if there is no ancestor fo:marker, pass a null. Now, this raises another issue. FONode has a getParent() method. This method may need to be expanded to include this concept. Any child could then ask for its parent either with null (go up the tree through fo:marker, i.e. the way the input specifies, and the way it works now), or with a grafting point specified, so that if a grafting point is specified, it will go up the tree in that direction instead. In fact, it may be good to create a GraftingPoint interface that RetrieveMarker implements, in case there are additional similar items now or in the future. class Marker { ... getParent(GraftingPoint gp) { if (gp == null) { return this.parent; } return gp.getParent(null); } ... } So, lets use: font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) as an example. Lets assume an FOTree fragment that looks like this: fo:marker fo:block fo:inline For both the block and the inline, the get will need to research its ancestry to resolve the expression. If we pass the grafting point to the get, and the get directly or indirectly uses the getParent(GraftingPoint gp) method to find that ancestry, it seems to me that everybody has everything they need. The key insight for me here is that *none* of this is actually dependent on the Area Tree at all, that what we are really doing is grafting. Not so. Grafting, OK. But you can't resolve the expressions without the areas. I had originally thought that some Area Tree information would need to be passed in, but I really think the above is much more elegant, and more clearly follows the concepts that are in play. Of cource, I rely on the rest of you guys to tell me if I have missed something (a real possibility). Because this is not required in the fo:flows, a good deal of property storage efficiency is realizable. This is why I was talking some time ago about a PropertyValue type which is an RPN-style expression, which can be rapidly resolved without recourse to the general parser. Without it, we have to carry at least some expressions around in the raw, after having first parsed them in order to determine that we can't resolve them, and then throw them to the parser again whenever a) we have sufficient context, or b) the page is re-laid. The idea of performing a full parse on a given expression more than once makes me nauseous. Again, this is an implementation detail, and doesn't affect the interface. However, on the implementation side, it seems that the tradeoff will be between doing a full parse each time, or creating lots of objects. John Austin's inquiry about the huge number of objects created is what got me started down this line of thinking. I suppose that the
Re: Font variant SmallCaps Was: Re: (Chris) Re: Traits
Jeremias Maerki wrote: Font configuration should/will become easier. For TrueType and Type1 fonts this should just be a matter of specifying a list of directories in which to look for fonts. A cache is needed to speed up the inventory on startup. Hmhm. Not bad. My idea is still different: Having several font sources and the renderers merely announce which font sources they support. That leaves to option open for later to enable multiple renderers simultaneously. I don't think enabling multiple simultaneous renderers is worth the trouble. If there are fonts producing a different metric for the same font, all you share is parsing the FO tree and some property refining. You have to provide for the case that generated areas can't be reused across renderers and, if there should be any advantage, also for cases where it is possible to share areas. This looks ... complex. Example: FontSource A: TrueType fonts FontSource B: Type 1 fonts FontSource C: AWT fonts FontSource D: Base14 fonts FontSource E: PCL base fonts (just a guess) PDF renderer supports: A, B, D (maybe C) PostScript renderer supports: A, B, D (maybe C) Java2D/AWT renderer supports: A and C PCL renderer: Probably A, E . You need - a class managing the fonts - inquire the supported font types from the renderer - match the renderer font types with the managed fonts I still think it would be easier to get a renderer specific font manager from the renderer, or get the fonts directly from the renderer. The font managers or the renderes can share code for general configuration, caching, font file management etc. by subclassing a common class or by delegation (especially in the second case). J.Pietschmann
RE: [VOTE] Properties API
Peter B. West wrote: What about font-size=12pt+2%+0.8*from-parent(height div 32) ? Good question. Make it font-size=12pt+2%+0.8*(from-parent(height) div 32) though. Even nastier is font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) because in markers and in static-content, we have to keep track of where *all* property specifications occur in the ancestry FO tree to resolve it. In general, the functions will be resolvable as the FO tree is built. The tree is static, in the sense that the tree relationships Correct. Neither of the examples given has anything to do with the interface proposed, because all of the computations are done on the FOTree side of the house. 2% of what? Of a reference area. Of what actually gets laid out on a page. If a single flow object gets laid out over more than one page, that reference may vary, but nothing changes in the FO Tree. It makes o sense to second-guess the Area tree within the FO tree. It's within the Area tree that all of these floe objects begin to take on concrete dimensions. Sec. 7.8.4 indicate that font-size percentages apply to the parent element's font size, which would be from the FOTree, not from areas. However, I fear that in the general case you may be right. The relative column-width problem in tables may fall into this category. If so, then the solution is to pass the relevant Area object to the get method so that it can see more of the Area's context. Any Area can (or should) be able to see not only its Area Tree ancestry, but its FOTree ancestry as well. are maintained in spite of any to-ing and fro-ing with the Area Tree. Markers are an exception, and because marker properties are resolved in the context of the static-content into which they are eventually placed, all the information required for from-nearest-specified() must be available in the static-content FO subtrees. Yes, this is the real issue. Only one of the real issues, I'm afraid. OK, what are the others? Since an fo:marker's content can be used more than one place, this requires that its contents be grafted into the tree where needed. I think the only trick here is to pass the static content context back to the get method so that it knows how to get the information it needs. Sec 6.11.4 says that fo:retrieve-marker is (conceptually) replaced by the children of the fo:marker that it retrieves. The most general way that I can think of to implement this is to force the passage of a parent fop.fo.flow.RetrieveMarker in the get method's signature. This tells the get method: One of your ancestors is an fo:marker object, and, for purposes of this get, consider that ancestor grafted into the tree at this fo:retrieve-marker's location. Of course, if there is no ancestor fo:marker, pass a null. Now, this raises another issue. FONode has a getParent() method. This method may need to be expanded to include this concept. Any child could then ask for its parent either with null (go up the tree through fo:marker, i.e. the way the input specifies, and the way it works now), or with a grafting point specified, so that if a grafting point is specified, it will go up the tree in that direction instead. In fact, it may be good to create a GraftingPoint interface that RetrieveMarker implements, in case there are additional similar items now or in the future. class Marker { ... getParent(GraftingPoint gp) { if (gp == null) { return this.parent; } return gp.getParent(null); } ... } So, lets use: font-size=12pt+2%+0.8*(from-nearest-specified(height) div 32) as an example. Lets assume an FOTree fragment that looks like this: fo:marker fo:block fo:inline For both the block and the inline, the get will need to research its ancestry to resolve the expression. If we pass the grafting point to the get, and the get directly or indirectly uses the getParent(GraftingPoint gp) method to find that ancestry, it seems to me that everybody has everything they need. The key insight for me here is that *none* of this is actually dependent on the Area Tree at all, that what we are really doing is grafting. Not so. Grafting, OK. But you can't resolve the expressions without the areas. OK. You may be right. See above. I had originally thought that some Area Tree information would need to be passed in, but I really think the above is much more elegant, and more clearly follows the concepts that are in play. Of cource, I rely on the rest of you guys to tell me if I have missed something (a real possibility). Because this is not required in the fo:flows, a good deal of property storage efficiency is realizable. This is why I was talking some time ago about a PropertyValue type which is an RPN-style expression, which can be rapidly resolved without recourse to the general
DO NOT REPLY [Bug 25059] New: - [PATCH] LineLayoutManager: first word in line too long
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25059. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25059 [PATCH] LineLayoutManager: first word in line too long Summary: [PATCH] LineLayoutManager: first word in line too long Product: Fop Version: 1.0dev Platform: PC OS/Version: Linux Status: NEW Severity: Normal Priority: Other Component: page-master/layout AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] This patch addresses the case where the first word in the line is too long (prevBP == null), both when this is the first word in this LM (prev == null) and when it is not (prev != null). The attached test file demonstrates the problem and the result of the patch. Without the patch several error messages appear. With the patch they no longer appear, and almost correct output is obtained. The remaining irregularity in the output is addressed by my preceding patch. 1. If prevBP == null, get text for hyphenation back from prev (last breakposs in vecInlineBreaks). 2. If prevBP == null, reset to prev. These two formulations are conservative. If prevBP would always be equal to prev, the argument could simply be prev. However, apart from the case prevBP == null at the start of a line, there is at least one case in which prevBP is not the same as prev. Viz., if a bp is within the line length (if (bpDim.min availIPD.max) else) but !bBreakOK, bp is added to vecInlineBreaks, but prevBP is not set equal to it; in the next iteration prevBP != prev. I do not know whether this is intentional or an error. 3. Add the possibility to reset to a given breakposs. reset() always uses prevBP to reset to; because getNextBreakPoss keeps prevBP equal to null when it starts a line, we also need to be able to reset to a given breakposs. If we overload reset, we get complaints about an ambiguous reference reset(null). Note that reset(Position) is defined by AbstractLayoutManager, which does a different job. This method also allows for its argument to be null. The method reset() is reformulated in terms of the new method. 4. Allow for the case prev == null. 5. Allow for the case prev == null.
DO NOT REPLY [Bug 25059] - [PATCH] LineLayoutManager: first word in line too long
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25059. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25059 [PATCH] LineLayoutManager: first word in line too long --- Additional Comments From [EMAIL PROTECTED] 2003-11-27 20:58 --- Created an attachment (id=9323) The patch file
DO NOT REPLY [Bug 25059] - [PATCH] LineLayoutManager: first word in line too long
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25059. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25059 [PATCH] LineLayoutManager: first word in line too long --- Additional Comments From [EMAIL PROTECTED] 2003-11-27 20:59 --- Created an attachment (id=9324) Example fo file
RE: [VOTE] Properties API
--- Victor Mote [EMAIL PROTECTED] wrote: validateValidProperty(int propVal) { return (supportedProps[propVal] == 1); } (Come to think of it, we can probably keep validateValidProperty() in the FObj base class alone as well!) IOW (I assume) use a 2-dimensional array, the first dimenension representing the Object, the second dimension representing the list of possible Properties. No--I was thinking just overriding the (static?) member variable array supportProps in each FObj subclass. That way we can keep validateValidProperty() in just the parent class. Minor point, though, from what I was thinking. Glen __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
DO NOT REPLY [Bug 19851] - Error while opening the distribution file
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19851. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19851 Error while opening the distribution file [EMAIL PROTECTED] changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||INVALID --- Additional Comments From [EMAIL PROTECTED] 2003-11-27 23:39 --- fop-0.20.5rc2-bin.tar.gz isn't there anymore (and wasn't broken anyway) Besides, FOP 0.20.5 is also available as ZIP.
Wiki (was Re: cvs commit: xml-fop/src/documentation/content/xdocs book.xml)
Victor Mote wrote: Modified:src/documentation/content/xdocs book.xml Log: Added link to the Wiki Chris: FYI, there is a link to the Wiki on the dev tab, the thinking being that the Wiki would be used primarily for development and design issues that are probably not of general interest to the users. If you think it is needed on the user tab too, that is OK. Yes, I wouldn't limit the Wiki to development issues. IMHO it should also be an easy way for users to contribute to the documentation. The Cocoon Wiki is a nice example. Christian