RE: Unnesting properties and makers.
-Original Message- From: John Austin [mailto:[EMAIL PROTECTED] snip / So I copied that program and ran it on my RH 9 system. Hmmm... so you copied it with or without the cp-error ;) Got the following results. I am just quoting the results here: [EMAIL PROTECTED] foptest]$ java -classpath . x false method call 998 true method call 1001 false instanceof 3008 true instanceof 4119 [EMAIL PROTECTED] foptest]$ java -server -classpath . x false method call 1 true method call 0 false instanceof 0 true instanceof 4822 [EMAIL PROTECTED] foptest]$ java -server x false method call 1 true method call 0 false instanceof 0 true instanceof 4784 My guess is: with. Never mind, just need to swap the results then... H. You also wondering what to conclude, ay? (apart from -server being a real booster) Cheers, Andreas
Re: Unnesting properties and makers.
--- Finn Bock [EMAIL PROTECTED] wrote: Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: public interface Property { public Length getLength(); public Space getSpace(); ... } [Glen Mazza] Finn, just so I understand more here--what is the set of methods that this interface would have? (You don't have to give me a full enumeration if it's huge--but let me know you determine them.) How many of them are there--10 of them or 20 or 30 or ??? This is the full set, exactly the same which now exists in Property as null methods. public Length getLength(); public ColorType getColorType(); public CondLength getCondLength(); public LengthRange getLengthRange(); public LengthPair getLengthPair(); public Space getSpace(); public Keep getKeep(); public int getEnum(); public char getCharacter(); public Vector getList(); public Number getNumber(); public Numeric getNumeric(); public String getNCname(); public Object getObject(); public String getString(); The name of the returned compound property values would change according to the new naming rule that we decide. regards, finn
Re: Unnesting properties and makers.
[Glen Mazza] I now understand what you're saying, and like the simplification you're suggesting. The current naming however, is probably preferable--the word Property figures quite highly in the spec! Do you have a problem remaining with it? Not at all, it is just that I though it would be confusing to rename the 'datatypes' classes to XXXProperty as they would conflict with the old XXXProperty classes, but it is only a problem when you compare before vs. after. If the change is done, the resulting XXXProperty classes will be completely consistent. For those (*)'ed datatypes, can't we get rid of the datatype instead by rolling that datatype into the equivalently named Property? In turn, have *those* Properties extend AbstractProperty as you suggest. Actually, I guess I'm just saying the same thing you're suggesting, except to use --Property instead of --Type for everything. Indeed. Which package should the resulting rolled datatype/property be placed in? My feeling says fop.datatypes (and the nested makers should be unnested and placed in fop.fo.properties). But that is a separate suggestion which does not have to be dealt with initially. Offhand, it's doesn't seem natural to go without Property objects--they are kept in the PropertyList and indexed by the property ID in that list. That would still be the case. Everything stored in the PropertyList implements the Property interface. But only a few of them would extend AbstractProperty, correct--or would you plan on having all do so? All of the properties would extend AbstractProperty. That way the properties get the default 'null' implementation of all the gettype methods. The only hard requirement is that all the properties implement the Property interface. Except that the code above should IMHO use if (prop.getLength() != null) to test for a length type instead of using instanceof. Well, instanceof is slower I believe, but better self-commenting. If you switch to this type of conditional for speed, just add a short comment of its purpose--here, to determine if we are working with an EnumProperty or a LengthProperty. (Another option, BTW, if you think it will cut down on buggy programming, is to have the classes implementing this Property interface supply unsupported interface methods a la Peter's Read-Only BitSet[1], i.e., throw exceptions. We can revisit this topic later if code errors are becoming a problem.) In most cases a NPE exception is throws immediately after the call to gettype, but an exception thrown from within the gettype could indeed carry more information about the cause of failure. I still like the null return and null test better than the alternatives tho. regards, finn
Re: Unnesting properties and makers.
Each of the typeType classes also implements the gettype methods from Property so the layout must do exactly the same as it does now to extract the right value: propertyList.get(PR_INLINE_PROGRESSION_DIMENSION). getLengthRange().getOptimum().getLength(); [Andreas L. Delmelle] Hmmm... coming back to my recent question about the use of/access to the background-color property: I somehow would feel much for further extending the way the Common*Properties are handled. IIC, the calls like the above should only happen in the background via the propMgr of the FObj, and not become part of the public API. I dunno. The spec clearly list which properties that apply for a element: file:///d:/java/REC-xsl/slice6.html#fo_external-graphic so it makes sense to find the same list of assignments in the layout managers. As a concrete example, in Layout, I would rather see something like: private AreaDimensionProps adimProps; ... protected void initProperties(PropertyManager propMgr) { adimProps = propMgr.getAreaDimensionProps(); ... } Yeah, if it make sense to add more groups of properties together (and it seems that such a ipd,bpd pair make sense) I don't see a problem adding that. ... Length ipd = aProps.ipd; Yes, except that it is a LengthRange property. (maybe the latter can become more abstract PropertyValue ipd = aProps.ipd; ) My gut feeling says no. Unless the property in question can take non-LengthRange values (which ipd can not). The layoutmanagers should resolve the property value as far as they can as early as they can IMHO. regards, finn
Re: Unnesting properties and makers.
--- Finn Bock [EMAIL PROTECTED] wrote: however, is probably preferable--the word Property figures quite highly in the spec! Do you have a problem remaining with it? Not at all, it is just that I though it would be Good--we can stick with Property then. Indeed. Which package should the resulting rolled datatype/property be placed in? My feeling says fop.datatypes (and the nested makers should be unnested and placed in fop.fo.properties). But that is a separate suggestion which does not have to be dealt with initially. Yes, it doesn't matter right now--do what you think is best, we can rearrange them later if needed. Unnesting is fine--I particularly liked the new PropertyMaker class. One issue--before I forget--in the FOPropertyMapping, for the colors, we have a huge set of genericColor.addKeyword(blue, #); genericColor.addKeyword(red, #); etc... etc... I just noticed, however, that the datatypes.ColorType class already has color types predefined within it. Do we really need to have both? I think we can get rid of one or the other, correct? I still like the null return and null test better than the alternatives tho. OK. Sounds good. The patch looks fine to me. Thanks, Glen __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/
RE: Unnesting properties and makers.
-Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] [Andreas L. Delmelle] Hmmm... coming back to my recent question about the use of/access to the background-color property: I somehow would feel much for further extending the way the Common*Properties are handled. IIC, the calls like the above should only happen in the background via the propMgr of the FObj, and not become part of the public API. I dunno. The spec clearly list which properties that apply for a element: file:///d:/java/REC-xsl/slice6.html#fo_external-graphic (Off-topic: Finn, I don't think I have access to your d:-drive ;) ) so it makes sense to find the same list of assignments in the layout managers. Indeed it does, but I don't think Layout needs them all, neither does it need them in their initial 'states' (don't really know what other word to use for this...). For instance: fo:block background-color=inherited-property-value(color) ... The layout manager doesn't need _this_ value of the property, it needs the actual ColorType (so I guess I basically agree with your comment about the more abstract version). snip / Yeah, if it make sense to add more groups of properties together (and it seems that such a ipd,bpd pair make sense) I don't see a problem adding that. I just think this will lead to an API that's a bit clearer, cleaner and so, in the long run, easier to manage and maintain. I don't really know whether the Common*Properties were separated out because they are, well, common, and it's more efficient for them to be treated as a bundle. Maybe it was originally the intention of creating property groups along the groups in which they are divided in the spec (see http://xml.apache.org/fop/compliance.html)? AFAICT the basic framework is already present to tie the 'propertyList.get(...)'-calls all together in the PropertyManager. If it is decided at a later point that something needs to be added/modified WRT Properties, this could avoid having to modify numerous corresponding propertyList.get()-calls in all related FObj's / LM's / Areas. ( Referring to the string-int conversion, and the hours Glen has spent to trace the calls and replace the constant-names... ) ... Length ipd = aProps.ipd; Yes, except that it is a LengthRange property. Ouch! My mistake :) Cheers, Andreas
Re: Unnesting properties and makers.
Glen Mazza wrote: Well, instanceof is slower I believe, but better self-commenting. Instanceof is exactly as fast as a simple function call after warm-up. J.Pietschmann
Re: Unnesting properties and makers.
Glen Mazza wrote: Well, instanceof is slower I believe, but better self-commenting. [J.Pietschmann] Instanceof is exactly as fast as a simple function call after warm-up. That is not what I remembered, so I made a small test program and ran it with 3 different versions of jdk: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 160 true method call 170 false instanceof 581 true instanceof 581 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 16614 true method call 881 false instanceof 1162 true instanceof 2083 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 581 true method call 661 false instanceof 2153 true instanceof 2734 I really don't know what to conclude from this test, but at least I'm glad I didn't mentioned performance as the reason why I prefer the gettype way of testing for subclasses. I'm surprised of the slow performance of calling non-overridden methods in jdk1.3.1. I don't have any explanation for that. regards, finn import java.io.*; import java.net.*; public class x { public static final int ITERS = 1; public static void main(String[] args) throws Exception { Prop prop = new Prop(); Prop stringprop = new StringProp(); // Warm up the JIT. testCall(prop); testInstanceOf(prop); long now; now = System.currentTimeMillis(); testCall(prop); System.out.println(false method call + (System.currentTimeMillis() - now)); now = System.currentTimeMillis(); testCall(stringprop); System.out.println(true method call + (System.currentTimeMillis() - now)); now = System.currentTimeMillis(); testInstanceOf(prop); System.out.println(false instanceof + (System.currentTimeMillis() - now)); now = System.currentTimeMillis(); testInstanceOf(stringprop); System.out.println(true instanceof + (System.currentTimeMillis() - now)); } public static void testInstanceOf(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop.getString() != null; } } public static void testCall(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop instanceof StringProp; } } public static class Prop { public String getString() { return null; } } public static class StringProp extends Prop{ String value = a string; public String getString() { return value; } } }
RE: Unnesting properties and makers.
-Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] snip / public static void testInstanceOf(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop.getString() != null; } } public static void testCall(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop instanceof StringProp; } } I'd swap either the method names or the contained expressions to get dependable results (typo? Don't know if it's exactly the same code you ran to get the test-results... or am I missing the point? --happens all too often, I'm afraid.) Cheers, Andreas
Re: Unnesting properties and makers.
[Andreas L. Delmelle] snip / public static void testInstanceOf(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop.getString() != null; } } public static void testCall(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop instanceof StringProp; } } I'd swap either the method names or the contained expressions to get dependable results (typo? Yeah, an embarrassing copypaste bug. Thanks for catching it. The result is then: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 581 true method call 581 false instanceof 160 true instanceof 170 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 1272 true method call 2304 false instanceof 17945 true instanceof 912 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 2154 true method call 2754 false instanceof 590 true instanceof 651 regards, finn
Re: Unnesting properties and makers.
file:///d:/java/REC-xsl/slice6.html#fo_external-graphic [Andreas L. Delmelle] (Off-topic: Finn, I don't think I have access to your d:-drive ;) ) I hope not :-0 . Sorry about that. Yeah, if it make sense to add more groups of properties together (and it seems that such a ipd,bpd pair make sense) I don't see a problem adding that. I just think this will lead to an API that's a bit clearer, cleaner and so, in the long run, easier to manage and maintain. I don't really know whether the Common*Properties were separated out because they are, well, common, and it's more efficient for them to be treated as a bundle. Maybe it was originally the intention of creating property groups along the groups in which they are divided in the spec (see http://xml.apache.org/fop/compliance.html)? I don't know what the original intention was either but from the no-longer-used setup() methods in the flow objects like fo.flow.Block it looks like somebody once wanted the list of properties from the spec to be represented in the code. But that should not prevent us from doing it differently. regards, finn
Re: Unnesting properties and makers.
Finn Bock wrote: Instanceof is exactly as fast as a simple function call after warm-up. That is not what I remembered, [Snip] I'm surprised. I made some measurements with a JDK 1.3.0, with ~50 warm-up cycles to give HotSpot something to optimize, and vaguely remembered instanceof was slightly faster (~1%) than a foo(){return true;}. It may have something to do with the test setup. I wouldn't rule out I tested in a class without inheritance :-) J.Pietschmann
RE: Unnesting properties and makers.
-Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] The result is then: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 581 true method call 581 false instanceof 160 true instanceof 170 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 1272 true method call 2304 false instanceof 17945 true instanceof 912 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 2154 true method call 2754 false instanceof 590 true instanceof 651 Very, very interesting... Java's OO-optimization at its best (except for 1.3)! After all, it shouldn't be *that* surprising that an accessor-method-call generates more overhead than a test for class-membership (but what if the class in question is not yet loaded at time? Not that this should occur a lot...) Cheers, Andreas
RE: Unnesting properties and makers.
On Mon, 2004-01-26 at 17:45, Andreas L. Delmelle wrote: -Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] The result is then: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 581 true method call 581 false instanceof 160 true instanceof 170 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 1272 true method call 2304 false instanceof 17945 true instanceof 912 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 2154 true method call 2754 false instanceof 590 true instanceof 651 Very, very interesting... When did the choice of JVM (java -client | java -server) appear ? Wasn't it 1.3 ? -- John Austin [EMAIL PROTECTED]
Re: Unnesting properties and makers.
J.Pietschmann wrote: Glen Mazza wrote: Well, instanceof is slower I believe, but better self-commenting. Instanceof is exactly as fast as a simple function call after warm-up. That's very useful to know. instanceof has had a very bad press. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
RE: Unnesting properties and makers.
On Mon, 2004-01-26 at 17:45, Andreas L. Delmelle wrote: -Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] The result is then: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 581 true method call 581 false instanceof 160 true instanceof 170 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 1272 true method call 2304 false instanceof 17945 true instanceof 912 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 2154 true method call 2754 false instanceof 590 true instanceof 651 Very, very interesting... Java's OO-optimization at its best (except for 1.3)! After all, it shouldn't be *that* surprising that an accessor-method-call generates more overhead than a test for class-membership (but what if the class in question is not yet loaded at time? Not that this should occur a lot...) So I copied that program and ran it on my RH 9 system. Got the following results. I am just quoting the results here: Note that the default JVM is -client or HotSpot ... [EMAIL PROTECTED] foptest]$ java -classpath . x false method call 998 true method call 1001 false instanceof 3008 true instanceof 4119 [EMAIL PROTECTED] foptest]$ java -server -classpath . x false method call 1 true method call 0 false instanceof 0 true instanceof 4822 [EMAIL PROTECTED] foptest]$ java -server x false method call 1 true method call 0 false instanceof 0 true instanceof 4784 java version 1.4.2 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28) Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode) H. -- John Austin [EMAIL PROTECTED]
Re: Unnesting properties and makers.
Finn Bock wrote: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 581 true method call 581 false instanceof 160 true instanceof 170 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 1272 true method call 2304 false instanceof 17945 true instanceof 912 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 2154 true method call 2754 false instanceof 590 true instanceof 651 These appear to be running on the same system. It's good news for instanceof, but what startles me is the performance of 1.2.2 relative to 1.4.2_02 (and, of course, 1.3.1_03. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Unnesting properties and makers.
Andreas L. Delmelle wrote: Does anybody know what space means for line-height??? Know? I guess not. But judging from the spec... Ah well, I overlooked this XSL adds the following value with the following meaning: space Specifies the minimum, optimum, and maximum values, the conditionality and precedence of the 'line-height' that is used in determining the half-leading. Perhaps this is just a way of saying that 'line-height' can be 'shorthanded' line-height=min opt max cond prec?? Uh no, it's more ugly: line-height is actually meant to be a compound property, like space-before. I.e. it is possible to write fo:block line-height.optimum=12.5pt line-heigth.maximum=13pt ... The precedence and conditionality are combination of the half-leading with space-before and space-after at the beginning and the end of the block, I think. I see why they thought this is necessary, but this kind of spec makes it unnecessary hard to follow. J.Pietschmann
RE: Unnesting properties and makers.
-Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] snip / Ah well, I overlooked this And it's easy to overlook. The spec-layout is quite misleading, putting this XSL-addition in the place it is now... If you're reading diagonally, it looks more like an insignificant note. snip / Uh no, it's more ugly: line-height is actually meant to be a compound property, like space-before. I.e. it is possible to write fo:block line-height.optimum=12.5pt line-heigth.maximum=13pt ... Yup, suspected _something_ like this. I wanted to add the little phrasing: 'to make the party complete' ;) The precedence and conditionality are combination of the half-leading with space-before and space-after at the beginning and the end of the block, I think. Sounds like the correct interpretation, only that it's expressed more generally 'above the first ... or after the last ... placed in a reference area' --comes down to the same thing, in this case. I see why they thought this is necessary, but this kind of spec makes it unnecessary hard to follow. Hmmm.. I do agree that first making it look like line-height is a simple property, and then adding a little extension to the definition, making it exactly the opposite --that's definitely not the way to go. The definition should be revised here, if you ask me... Bottom-line is that line-height is supposed to be treated as a compound property, for which the subfields are defaulted to values according to the definition in the spec when it is used as a simple property. Cheers, Andreas
RE: Unnesting properties and makers.
-Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] snip / Each of the typeType classes also implements the gettype methods from Property so the layout must do exactly the same as it does now to extract the right value: propertyList.get(PR_INLINE_PROGRESSION_DIMENSION). getLengthRange().getOptimum().getLength(); Hmmm... coming back to my recent question about the use of/access to the background-color property: I somehow would feel much for further extending the way the Common*Properties are handled. IIC, the calls like the above should only happen in the background via the propMgr of the FObj, and not become part of the public API. As a concrete example, in Layout, I would rather see something like: private AreaDimensionProps adimProps; ... protected void initProperties(PropertyManager propMgr) { adimProps = propMgr.getAreaDimensionProps(); ... } ... Length ipd = aProps.ipd; (maybe the latter can become more abstract PropertyValue ipd = aProps.ipd; ) Does this sound crazy? (FYI: the AreaDimensionProps class does not exist yet... LayoutProps, for example, is already present, but seems to be underused at the moment.) Cheers, Andreas
RE: Unnesting properties and makers.
-Original Message- From: Andreas L. Delmelle [mailto:[EMAIL PROTECTED] snip / LayoutProps, for example, is already present, but seems to be underused at the moment.) Speaking of which: what exactly is the purpose of having a spaceBefore/spaceAfter in fop.traits.LayoutProps and another in fop.fo.properties.CommonMarginBlock ? Got something to do with the prop-to-trait mapping? Or is this just an unfortunate clashing of names? Cheers, Andreas
Re: Unnesting properties and makers.
J.Pietschmann wrote: Peter B. West wrote: With my naive understanding of parsing as a two-stage process (lexemes - higher level constructs) I have been curious about earlier comments of yours about multi-stage parsing. Can ANTLR do this sort of thing? I'm not quite sure whether you mean by parsing as a two-stage process the same as I do. In language specs, the formal description is usually divided into a grammar level representing a Chomsky level 2 context free grammar and a lexical level, described by simple regular expressions (Chomsy level 0 IIRC). This is done both for keeping the spec readable and for efficient implementation ... This is basically what I meant - I see (and have experienced in FOP) the difficulty of trying to parse multiple grammars out of a single stream of lexical objects. Given the amount of hacking I had to do to parse everything that could legally be thrown at me, I am very surprised that these are the only issues in HEAD parsing. Well, one of the problems with the FO spec is that section 5.9 defines a grammar for property expressions, but this doesn't give the whole picture for all XML attribute values in FO files. There are also (mostly) whitespace separated lists for shorthands, and the comma separated font family name list, where a) whitespace is allowed around the commas and b) quotes around the names may be omitted basically as long as there are no commas or whitespace in the name. The latter means there may be unquoted sequences of characters which has to be interpreted as a single token but are not NCNames. It also means the in the font shorthand there may be whitespace which is not a list element delimiter. I think this is valid: font=bold 12pt 'Times Roman' , serif and it should be parsed as font-weight=bold font-size=12pt font-family='Times Roman' , serif then the font family can be split. This is easy for humans but can be quite tricky to get right for computers, given that the shorthand list has a bunch of optional elements. Specifically font=bold small-caps italic 12pt/14pt 'Times Roman' , A+B,serif should be valid too. At least, the font family is the last entry. Note that suddenly a slash appears as delimiter between font size and line height... This usage, AFAICT, is the reason that division is specified by the token 'div'. All a matter of CSS compatibility. Another set of problems is token typing, the implicit type conversion and the very implicit type specification for the properties. While often harmless, it shows itself for the format property: the spec says the expected type is a string, which means it should be written as format='01'. Of course, people tend to write format=01. While the parsed number could be cast back into a string, unfortunately the leading zero is lost. The errata amended 5.9 specifically for this use case that in case of an error the original string representation of the property value expression should be used to recover. Which temps me to use initial-page-number=auto+1. This is one of the disgraces of the spec - this time for compatibility with XSLT usage. XSL-FO just cops it sweet whenever someone else's problem (SEP) extrudes into the XSL namespace. Another famous case is hyphenation-char=-, which is by no means a valid property expression. Additionally the restriction to a string of length 1 (a char) isn't spelled out explicitly anywhere. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Unnesting properties and makers.
J.Pietschmann wrote: ... Well, one of the problems with the FO spec is that section 5.9 defines a grammar for property expressions, but this doesn't give the whole picture for all XML attribute values in FO files. There are also (mostly) whitespace separated lists for shorthands, and the comma separated font family name list, where a) whitespace is allowed around the commas and b) quotes around the names may be omitted basically as long as there are no commas or whitespace in the name. The latter means there may be unquoted sequences of characters which has to be interpreted as a single token but are not NCNames. It also means the in the font shorthand there may be whitespace which is not a list element delimiter. I think this is valid: font=bold 12pt 'Times Roman' , serif and it should be parsed as font-weight=bold font-size=12pt font-family='Times Roman' , serif then the font family can be split. This is easy for humans but can be quite tricky to get right for computers, given that the shorthand list has a bunch of optional elements. Specifically font=bold small-caps italic 12pt/14pt 'Times Roman' , A+B,serif should be valid too. At least, the font family is the last entry. Note that suddenly a slash appears as delimiter between font size and line height... ... Alt-design takes a two-stage approach to parsing. In the first stage the basic datatypes are detected. Where there are nasty constructs hung over from CSS, as in 'font', the elements are collected into PropertyValueLists, in a manner dependent on whether the components were space or comma separated. From the javadoc comment to the 'parse' method in ...fo.expr.PropertyParser * Parse the property expression described in the instance variables. * * pThe ttPropertyValue/tt returned by this function has the * following characteristics: * If the expression resolves to a single element that object is returned * directly in an object which implements PropertyValue/tt. * * pIf the expression cannot be resolved into a single object, the set * to which it resolves is returned in a ttPropertyValueList/tt object * (which itself implements ttPropertyValue/tt). * * pThe ttPropertyValueList/tt contains objects whose corresponding * elements in the original expression were separated by emcommas/em. * * pObjects whose corresponding elements in the original expression * were separated by spaces are composed into a sublist contained in * another ttPropertyValueList/tt. If all of the elements in the * expression were separated by spaces, the returned * ttPropertyValueList/tt will contain one element, a * ttPropertyValueList/tt containing objects representing each of * the space-separated elements in the original expression. * * pE.g., if a bfont-family/b property is assigned the string * emPalatino, New Century Schoolbook, serif/em, the returned value * will look like this: * pre * PropertyValueList(NCName('Palatino') * PropertyValueList(NCName('New') * NCName('Century') * NCName('Schoolbook') ) * NCName('serif') ) * /pre * pIf the property had been assigned the string * emPalatino, New Century Schoolbook, serif/em, the returned value * would look like this: * pre * PropertyValueList(NCName('Palatino') * NCName('New Century Schoolbook') * NCName('serif') ) * /pre * pIf a bbackground-position/b property is assigned the string * emtop center/em, the returned value will look like this: * pre * PropertyValueList(PropertyValueList(NCName('top') * NCName('center') ) ) * /pre In the second stage (refineParsing) the lists are analysed in their context (e.g. 'font') and the appropriate final values are developed. The maintenance branch tried to unify all cases into a single framework, which quite predictably resulted in a complex and somewhat messy code. It's also less efficient than it could be: format=01 is (or would be) indeed parsed as expression, while an optimized parser can take advantage of the lack of any string operations and look for quoted strings and function calls only, returning the trimmed XML attribute value otherwise. This sounds promising. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Unnesting properties and makers.
--- Finn Bock [EMAIL PROTECTED] wrote: Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: public interface Property { public Length getLength(); public Space getSpace(); ... } Finn, just so I understand more here--what is the set of methods that this interface would have? (You don't have to give me a full enumeration if it's huge--but let me know you determine them.) How many of them are there--10 of them or 20 or 30 or ??? Thanks, Glen public class AbstractProperty { public Length getLength() { return null; } public Space getSpace() { return null; } ... } public class Length extends AbstractProperty { // Rest of datatypes.Length class. ... public Length getLength() { return this; } } __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/
Re: Unnesting properties and makers.
--- Finn Bock [EMAIL PROTECTED] wrote: [Glen Mazza] Could you explain why we have the datatypes instances to begin with--what they're for? I'm not sure what their precise purpose is. The datatypes are the slightly more complex property values. The property classes wraps the datatype in order to give the datatypes a common interface. Thanks for taking the time to explain this. My comprehension has increased quite a bit. Some of the concrete property subclasses wraps standard java types such as int, char, String, Number and Vector and for these properties we still need a wrapper. But some of them, marked with (*), wraps a datatype which is under our own control and for those properties, the datatype class could also function as the property wrapper. I now understand what you're saying, and like the simplification you're suggesting. The current naming however, is probably preferable--the word Property figures quite highly in the spec! Do you have a problem remaining with it? For those (*)'ed datatypes, can't we get rid of the datatype instead by rolling that datatype into the equivalently named Property? In turn, have *those* Properties extend AbstractProperty as you suggest. Actually, I guess I'm just saying the same thing you're suggesting, except to use --Property instead of --Type for everything. Offhand, it's doesn't seem natural to go without Property objects--they are kept in the PropertyList and indexed by the property ID in that list. That would still be the case. Everything stored in the PropertyList implements the Property interface. But only a few of them would extend AbstractProperty, correct--or would you plan on having all do so? I remember two cases, but I can only find one at the moment: In Title.setup(): prop = this.propertyList.get(PR_BASELINE_SHIFT); if (prop instanceof LengthProperty) { Length bShift = prop.getLength(); } else if (prop instanceof EnumProperty) { int bShift = prop.getEnum(); } This would stay the same, except LengthProperty would be called LengthType and EnumProperty would be called EnumType. Except that the code above should IMHO use if (prop.getLength() != null) to test for a length type instead of using instanceof. Well, instanceof is slower I believe, but better self-commenting. If you switch to this type of conditional for speed, just add a short comment of its purpose--here, to determine if we are working with an EnumProperty or a LengthProperty. (Another option, BTW, if you think it will cut down on buggy programming, is to have the classes implementing this Property interface supply unsupported interface methods a la Peter's Read-Only BitSet[1], i.e., throw exceptions. We can revisit this topic later if code errors are becoming a problem.) Thanks, Glen [1] http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-fop/src/java/org/apache/fop/datastructs/Attic/ROBitSet.java?content-type=text%2Fplainrev=1.1.2.2 __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/
Re: Unnesting properties and makers.
Peter B. West wrote: With my naive understanding of parsing as a two-stage process (lexemes - higher level constructs) I have been curious about earlier comments of yours about multi-stage parsing. Can ANTLR do this sort of thing? I'm not quite sure whether you mean by parsing as a two-stage process the same as I do. In language specs, the formal description is usually divided into a grammar level representing a Chomsky level 2 context free grammar and a lexical level, described by simple regular expressions (Chomsy level 0 IIRC). This is done both for keeping the spec readable and for efficient implementation: a CNF parser needs a stack, and while the Common Identifier can be described in a CNF, it's more efficient to use regular expression and implement the recognizer as a DFA, which doesn't shuffle characters to and from the stack top like mad. ANTLR provides for defining both the grammar and the lexical level in one file, and it will generate appropriate Java classes for the grammar parser as well as the token recognizer. It's not as efficient as the famous lex+yacc utilities, but this partly due to Java using Unicode, which would make the lookup tables much much larger if generated the same way lex does. Oh well: while yacc is a LARL(1), ANTLR can be configured as LR(n), with a few LL(n) stuff mixed in. Not that this matters much in practice, except for the number of concepts one has to understand while writing a parser. And don't ask me right now what the acronyms mean in detail, it's been 15 years since I really had to know this. Given the amount of hacking I had to do to parse everything that could legally be thrown at me, I am very surprised that these are the only issues in HEAD parsing. Well, one of the problems with the FO spec is that section 5.9 defines a grammar for property expressions, but this doesn't give the whole picture for all XML attribute values in FO files. There are also (mostly) whitespace separated lists for shorthands, and the comma separated font family name list, where a) whitespace is allowed around the commas and b) quotes around the names may be omitted basically as long as there are no commas or whitespace in the name. The latter means there may be unquoted sequences of characters which has to be interpreted as a single token but are not NCNames. It also means the in the font shorthand there may be whitespace which is not a list element delimiter. I think this is valid: font=bold 12pt 'Times Roman' , serif and it should be parsed as font-weight=bold font-size=12pt font-family='Times Roman' , serif then the font family can be split. This is easy for humans but can be quite tricky to get right for computers, given that the shorthand list has a bunch of optional elements. Specifically font=bold small-caps italic 12pt/14pt 'Times Roman' , A+B,serif should be valid too. At least, the font family is the last entry. Note that suddenly a slash appears as delimiter between font size and line height... Another set of problems is token typing, the implicit type conversion and the very implicit type specification for the properties. While often harmless, it shows itself for the format property: the spec says the expected type is a string, which means it should be written as format='01'. Of course, people tend to write format=01. While the parsed number could be cast back into a string, unfortunately the leading zero is lost. The errata amended 5.9 specifically for this use case that in case of an error the original string representation of the property value expression should be used to recover. Which temps me to use initial-page-number=auto+1. Another famous case is hyphenation-char=-, which is by no means a valid property expression. Additionally the restriction to a string of length 1 (a char) isn't spelled out explicitly anywhere. All in all I have the feeling the spec tried to provide a property specification system which would be powerful but still easy to manage by hand, and they ended up with a system containing as much or more unintended consequences as the C preprocessor. Which, as everybody knows, lead to weirdness like macro argument prescanning and 0xE-0x1 being a syntax error. Well, the C preprocessor had at least a simple first implementation. The maintenance branch tried to unify all cases into a single framework, which quite predictably resulted in a complex and somewhat messy code. It's also less efficient than it could be: format=01 is (or would be) indeed parsed as expression, while an optimized parser can take advantage of the lack of any string operations and look for quoted strings and function calls only, returning the trimmed XML attribute value otherwise. Finally, bless the Mozilla and MySpell folks for the spell checker... :-) J.Pietschmann
Re: Unnesting properties and makers.
Finn Bock wrote: ...--I believe, we do (frequently?) have more than one datatype per property, correct? I remember two cases, but I can only find one at the moment: In Title.setup(): Formally, there are a few more, for example initial-page-number. The code treats them as Java String. This prevents, for example, writing initial-page-number=1+1. prop = this.propertyList.get(PR_BASELINE_SHIFT); Some other properties which can have an enum or something numeric as value: alignment-adjust writing-mode (the auto enum) content-height and -width (auto and scale-to-fit) height, width and related stuff (auto, none) leader-pattern-width (use-font-metrics) page-heigth (auto, indefinite) table border precedences (force), 7.26.1 glyph-orientation text-altitude z-index letter-spacing (normal) word-spacing (normal) line-height (normal) Does anybody know what space means for line-height??? I'm also missing the fformal definition of name for markers (7.23.1 ff). The text-align has a string as the second type beside enum tokens. The text-shadow may be an enum (none), or a list of color values with an optional triple of numerical values. I should have added the latter as well as the text-decoration list to the list of exceptions in the other post a few minutes ago. Not to mention that nearly all properties may have the value inherit, which is both defined as a keyword in the grammar and quite often explicitely enumerated in the property description. And the clip property (7.20.1) is yet another challenge to parse. J.Pietschmann
RE: Unnesting properties and makers.
-Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] snip / Does anybody know what space means for line-height??? Know? I guess not. But judging from the spec... XSL adds the following value with the following meaning: space Specifies the minimum, optimum, and maximum values, the conditionality and precedence of the 'line-height' that is used in determining the half-leading. Perhaps this is just a way of saying that 'line-height' can be 'shorthanded' line-height=min opt max cond prec?? (and as such, can be a space-separated list of percentage, length, number --and enums for the latter two) Cheers, Andreas
Re: Unnesting properties and makers.
Finn Bock wrote: I have not yet removed the properties.xsl file from CVS. I guess it should be removed since it isn't used anymore. [J.Pietschmann] I think you could leave the file there for now. Ok. It should be sufficient to inactivate the related task in the buildfile (for example putting it in an XML comment). Too late for that, but I'll reactive the lines as comments tomorrow. Does anyone know why we wrap the datatypes instances in a property instance? No. Actually we should strive to use a proper parse tree for property expressions: 1. Create a few classes for the symbols in the property expression grammar (section 5.9 of the spec). I think we need as terminals - AbsoluteNumeric - RelativeNumeric - Color (the #N thingy) - String (aka Literal) - NCName (everything else, basically, including enum tokens and inherit) and for the nonterminals - PropertyFunction - Some classes for the operators 2. Write a proper parser (maybe using ANTLR, at least for bootstrap) which produces a proper parse tree. With my naive understanding of parsing as a two-stage process (lexemes - higher level constructs) I have been curious about earlier comments of yours about multi-stage parsing. Can ANTLR do this sort of thing? 3. Add methods to the objects for resolving relative numeric values (percentages, em) and for evaluation. 4. Perhaps add constant folding to the parser. Interesting. What issues do we have in property parsing that is solved by this? I'm only aware of arithmetic on relative numerics which doesn't work. Given the amount of hacking I had to do to parse everything that could legally be thrown at me, I am very surprised that these are the only issues in HEAD parsing. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Unnesting properties and makers.
Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: [Glen Mazza] Could you explain why we have the datatypes instances to begin with--what they're for? I'm not sure what their precise purpose is. The datatypes are the slightly more complex property values. The property classes wraps the datatype in order to give the datatypes a common interface. This list show the concrete Property subclasses and the datatypes that each of them wraps. CharacterPropertychar ColorTypePropertyColorType (*) CondLengthProperty CondLength (*) EnumProperty int KeepProperty Keep (*) LengthPairProperty LengthPair (*) LengthProperty Length,AutoLength,FixedLength,PercentLength (*) LengthRangeProperty LengthRange (*) ListProperty Vector NCNameProperty String NumberProperty Number NumericProperty Numeric (*) SpacePropertySpace StringProperty String ToBeImplementedProperty Some of the concrete property subclasses wraps standard java types such as int, char, String, Number and Vector and for these properties we still need a wrapper. But some of them, marked with (*), wraps a datatype which is under our own control and for those properties, the datatype class could also function as the property wrapper. Offhand, it's doesn't seem natural to go without Property objects--they are kept in the PropertyList and indexed by the property ID in that list. That would still be the case. Everything stored in the PropertyList implements the Property interface. In the list below of the new property classes, all the typeType classes implements Property and are stored in PropertyList. CharacterType char ColorTypeType it-self CondLengthType it-self EnumTypeint KeepTypeit-self LengthPairType it-self LengthType, AutoLengthType, FixedLengthType, PercentLengthType it-self LengthRangeType it-self ListTypeVector NCNameType String NumberType Number NumericType it-self SpaceType it-self StringType String ToBeImplementedType Each of the typeType classes also implements the gettype methods from Property so the layout must do exactly the same as it does now to extract the right value: propertyList.get(PR_INLINE_PROGRESSION_DIMENSION). getLengthRange().getOptimum().getLength(); For the classes which are both property and datatype, the gettype method becomes: public type gettype() { this this; } Furthermore, those are the objects requested by layout. What would be your alternative storage technique otherwise--I believe, we do (frequently?) have more than one datatype per property, correct? I remember two cases, but I can only find one at the moment: In Title.setup(): prop = this.propertyList.get(PR_BASELINE_SHIFT); if (prop instanceof LengthProperty) { Length bShift = prop.getLength(); } else if (prop instanceof EnumProperty) { int bShift = prop.getEnum(); } This would stay the same, except LengthProperty would be called LengthType and EnumProperty would be called EnumType. Except that the code above should IMHO use if (prop.getLength() != null) to test for a length type instead of using instanceof. I'm not sure what I propose as the naming convention for the new combined property/value, but Alt-Design calls them typeType so I used that in the list above. regards, finn
Re: Unnesting properties and makers.
Finn Bock wrote: I have not yet removed the properties.xsl file from CVS. I guess it should be removed since it isn't used anymore. I think you could leave the file there for now. It should be sufficient to inactivate the related task in the buildfile (for example putting it in an XML comment). Does anyone know why we wrap the datatypes instances in a property instance? No. Actually we should strive to use a proper parse tree for property expressions: 1. Create a few classes for the symbols in the property expression grammar (section 5.9 of the spec). I think we need as terminals - AbsoluteNumeric - RelativeNumeric - Color (the #N thingy) - String (aka Literal) - NCName (everything else, basically, including enum tokens and inherit) and for the nonterminals - PropertyFunction - Some classes for the operators 2. Write a proper parser (maybe using ANTLR, at least for bootstrap) which produces a proper parse tree. 3. Add methods to the objects for resolving relative numeric values (percentages, em) and for evaluation. 4. Perhaps add constant folding to the parser. J.Pietschmann
Re: Unnesting properties and makers.
I have not yet removed the properties.xsl file from CVS. I guess it should be removed since it isn't used anymore. [J.Pietschmann] I think you could leave the file there for now. Ok. It should be sufficient to inactivate the related task in the buildfile (for example putting it in an XML comment). Too late for that, but I'll reactive the lines as comments tomorrow. Does anyone know why we wrap the datatypes instances in a property instance? No. Actually we should strive to use a proper parse tree for property expressions: 1. Create a few classes for the symbols in the property expression grammar (section 5.9 of the spec). I think we need as terminals - AbsoluteNumeric - RelativeNumeric - Color (the #N thingy) - String (aka Literal) - NCName (everything else, basically, including enum tokens and inherit) and for the nonterminals - PropertyFunction - Some classes for the operators 2. Write a proper parser (maybe using ANTLR, at least for bootstrap) which produces a proper parse tree. 3. Add methods to the objects for resolving relative numeric values (percentages, em) and for evaluation. 4. Perhaps add constant folding to the parser. Interesting. What issues do we have in property parsing that is solved by this? I'm only aware of arithmetic on relative numerics which doesn't work. regards, finn
Re: Unnesting properties and makers.
--- Finn Bock [EMAIL PROTECTED] wrote: Hi, After updating from CVS, it is most likely necessary to do an ant clean to get rid of the old generated maker classes, before building. Great job--the build is now only 604 classes--1/3 removed! This simplification does make the properties easier to understand (although I'm still quite far from fully comprehending them.) I have not yet removed the properties.xsl file from CVS. I guess it should be removed since it isn't used anymore. Good idea. I've found an argument for unnesting the maker classes from their property classes: If we want to put the makers in its own package and I think it would be a little cleaner to do that. Using the fo.properties package seems natural. Makes sense. Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: Could you explain why we have the datatypes instances to begin with--what they're for? I'm not sure what their precise purpose is. Offhand, it's doesn't seem natural to go without Property objects--they are kept in the PropertyList and indexed by the property ID in that list. Furthermore, those are the objects requested by layout. What would be your alternative storage technique otherwise--I believe, we do (frequently?) have more than one datatype per property, correct? Thanks, Glen __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/