Re: Questionable whether font-shorthand grammar LL(1)
Thanks everyone for your parser suggestions. I believe we should be able to do without one for the font shorthand, but this is definitely something to keep in mind if we want to improve the parsing of other properties. I’m starting to realise that the most difficult part is probably not so much the grammar parsing as the lexical analysis. To be continued, I guess... Vincent Laurent Caillette wrote: Hi all, I've never used SableCC or JavaCC so I cannot compare, but I'm using ANTLR a lot. ANTLR is highly customizable and has a very strong community. It's integrated development environment offers a debugger and visualization of grammar ambiguities. It's not only simple to setup and use, it also offers all the comfort you can reasonably dream of when developing grammars. Maybe that a tool like JarJar could reduce the pain of depending on one more library (with all possible conflicts that could happen to FOP users). Because code generation has some drawbacks (at least in terms of build complexity) you may be interested by JParsec, which creates parsers dynamically from pure Java code. Disclaimer: never used it. http://jparsec.codehaus.org Hope this will help you to do a reasonable choice. c. -Message d'origine- De : berger@gmail.com [mailto:berger@gmail.com] De la part de Max Berger Envoyé : mardi 29 septembre 2009 13:00 À : fop-dev@xmlgraphics.apache.org Objet : Re: Questionable whether font-shorthand grammar LL(1) Hi Vincent, 2009/9/29 Vincent Hennebert vhenneb...@gmail.com: How about specifing the grammer and using a tool such as JavaCC to generate the actual parser? This way you could focus more complete grammer and have to spend less time writing the parser. That would be the same as using ANTLR. I feel that this is a bit overkill for just parsing the font shorthand property, although that may prove to be useful for other properties that can accept complex expressions. That said, JavaCC is an interesting suggestion, I didn’t think of it. If a choice had to be made between ANTLR and JavaCC, which one would win? ANTLR: - easy to use - requires runtime linking of jar [1] (a *huge* disadvantage imo) JavaCC: - very sparse documentation - generates standalone java classes SableCC: - better documentation - LGPL (And therefore maybe not feasible, although it would only be used at compile time and not runtime) [1] http://beust.com/weblog/archives/000145.html Max
Re: Questionable whether font-shorthand grammar LL(1)
Hi Jonathan, Jonathan Levinson wrote: Hi Vincent, Excellent ideas! The diagram you drew is extremely useful! If the font shorthand sub-language has a grammar that is regular then it also has a grammar that is LL(1). So recursive descent parsing will work, if there is a regular grammar. I think the best way of getting font shorthand to work would proceed in stages: 1) First get the current code to properly parse and accept valid font shorthand expressions. This should be very easy. The one remaining problem (AFAIK) is the parsing of font-size/line-height where /line-height is optional. Currently spaces are not allowed around the slash / and they should be. I'm going to try to get to this problem as soon as I have time, probably in a day or so. The current code predates the switch to Java 1.4 as a minimum requirement, so couldn’t use the java.util.regex package. Feel free to make use of regular expressions if you think that will make the job easier. 2) Evaluate which parser or automaton approach is the simplest and produces better error states than the current approach. 3) Implement the approach one has chosen in (2). Good luck! snip/ Vincent
Re: Best Interface for reading OpenType Files
Hi Vincent, I see. I had in mind to use OpenTypeDataInputStream as the common interface. It actually makes sense to use ImageInputStream instead. Simpler and just as flexible. That will add a direct dependency on a class in the javax.imageio package, but this is not a problem as it is part of the standard library. That ImageInputStream interface is unfortunately named really. What did you mean with your last sentence? That ImageInputStream isn't named good? So if I should vote, it would properly vote for spring. Well I’m not sure I like the abundance of XML in spring actually. POJOs powaaa! Also, spring may be overkill to just deploy FOP. Anyway, this is probably a bit early to discuss that. (What do you think of the following though: http://code.google.com/p/google-guice/ ?) I heard of it before, but didn't inform myself about it. So I took your pointer as motivation to have a look at it. I watched the Google I/O - Big Modular Java with Guice [1] talk on youtube. It looks very promising. I'm not agains this XML config stuff, but if I can get the same with annotations and standard Java code - why not. Of course I like this whole type safety stuff, but with Intellij I get this in Spring XML too. [1]: http://www.youtube.com/watch?v=hBVJbzAagfs - does the use of serializable objects make sense? What would be more efficient: re-parsing font data all the time or re-loading serializable object representation of them? You mean the font metrics XML files? I've alwas asking me for what propose they are there. No, I don't think, we need this. I really don't want to serialize the Advanced OpenType Features! It took me already a good amount of code to parse just a bit of it. What I meant was to use the java.io.Serializable interface. I don’t indeed think XML representations are any useful, apart maybe for debugging purpose or to have a more human-readable version of the font file. IIC there would be next to nothing to do to cache Serializable objects on the hard drive and retrieve them? Hmmm. Ok. But if we want to use Serializable for that, your classes have to be very stable. Versioning the Serializable stuff is a real burden in my opinion. So we will need a cache which detects version changes and invalidate the objects if so. Do you know such a lib? I was thinking that just catching the InvalidClassException when reading the object would be enough to conclude that the cache is no longer valid and must be re-created. Maybe I’m wrong? I must confess that I have no experience with serialization. Yes this could work. But I find it always difficult and time consuming to design classes for serialization. And reading the serialized version is most likely not much faster than reading the actual OpenType file. So I would really want to wait until we have a real performance problem. Best Regards Alex signature.asc Description: This is a digitally signed message part
Re: Confused about checkstyle use
Hi Max, First, I will respect every code style of FOP. Its just a matter of discussion. Really? That means commenting every public method even simple Getters and Setters? Yes. Simple Getter and Setters are the only place where you can publicly document private variables. (in most cases, comment in the getter and link from the setter) Yes thats right. But is this Javadoc better than no Javadoc? public class Person { /** * Returns the first name of this person. * * @returns the first name of this person. */ public String getFirstName() { return firstName; } } Commenting equals(), hashCode() and toString()? I think, this would be only clutter. /** {...@inheritdoc} */ In my eyes this is enough clutter. I saw classes in FOP with maybe 10 methods using this /** {...@inheritdoc} */. It just distracts the eye from ready the actual method name. And it adds absolutely no information for the source code reader. would do the trick on those, UNLESS they implement something which is unexpected (such as the equals methods I recently renamed which did not implement equals) or special (a toString which creates a guaranteed parsable result for example) Hmmm. A equals method shouldn't do anything unexpected. But your toString() example is a good one. If such standard methods do something more as the comment in Object says, that a comment is useful. I think it's the same as on simple public methods like the getter from above. If your comment doesn't say anything more than the method name says already, I don't want to read it. Best Regards Alex signature.asc Description: This is a digitally signed message part
RE: Support for Arabic in FOP
Hi, I am not sure on the licensing part as sebastian did some changes in FOP code and he provided me the jars. And as per what i had checked those jar print arabic correctly. Possibly he will only be able to answer and I am nots ure whether the change was made keeping FOP standards. He was planning to do bidi algorithm, no idea whether he worked on it later and whether he contribuited the below change to FOP. Below were his comments - If I set the writing-mode to rl-tb my text is flipped vertical. This happens because the CTM class rotates the transformation matrix for rendering according to the writing mode. If I want to write right-to-left this has nothing to do with mirroring of cause and I disabled it, because I want to print arabic text. So what is the purpose of mirroring in rl-tb writing-mode? What errors will appear if I disable the CTM.getWMctm() function that does the mirroring according to the writing-mode? I achived printing (pdf) arabic text after some weeks of work ignoring any xsl:fo recommendations. The most things I did in the TextLayoutManager. Now I'm thinking about implementing it according to the recommendations and the BIDI algorithm. Hi Prakash, you can download the version of FOP that I use to print Arabic script from www.anneundsebp.de/fop/fop.html I hope it works for you. Unfortunately I don't understand Arabic but I know that there are still some problems with the type setting. Maybe you can inform me about bugs you'll find. I'll add some explanations and the source code in a few days. Regards Sebastian -- View this message in context: http://www.nabble.com/Volunteering-to-work-on-FOP-development-tp25442059p25680065.html Sent from the FOP - Dev mailing list archive at Nabble.com.
Re: Checkstyle RedundantThrowsCheck
Hi Vincent, Speaking of that, there’s a rule that I would suggest to disable: the HiddenFieldCheck. I don’t really see its benefit. It forces to find somewhat artificial names for variables, where the field name is exactly what I want. Sometimes a method doesn’t have a name following the setField pattern, yet still acts as a setter for Field. This rule would make sense if we were using a Hungarian-like notation for variables (mMember, pParam, etc.), but that’s not the case in FOP. WDYT? Yes I would vote for it. In modern IDE's one sees clearly the difference between an instance field and a local variable. This is also the reason why this Hungarian-like scope notation is largely gone in Java. Best Regards Alex signature.asc Description: This is a digitally signed message part
Re: Checkstyle RedundantThrowsCheck
Hi Max, Speaking of that, there’s a rule that I would suggest to disable: the HiddenFieldCheck. I don’t really see its benefit. It forces to find somewhat artificial names for variables, where the field name is exactly what I want. Sometimes a method doesn’t have a name following the setField pattern, yet still acts as a setter for Field. This rule would make sense if we were using a Hungarian-like notation for variables (mMember, pParam, etc.), but that’s not the case in FOP. WDYT? I like the rule, BUT I am ok with an exception for setters and constructors (this is IMO a new option in checkstyle 5): http://checkstyle.sourceforge.net/config_coding.html#HiddenField The exclusion of constructors an setters is important. Otherwise we would be forced to use some Hungarian-like scope notation. But why do you think, that this rule is useful at all? Best Regards Alex signature.asc Description: This is a digitally signed message part
Re: Best Interface for reading OpenType Files
Hi Alexander, Alexander Kiel wrote: Hi Vincent, I see. I had in mind to use OpenTypeDataInputStream as the common interface. It actually makes sense to use ImageInputStream instead. Simpler and just as flexible. That will add a direct dependency on a class in the javax.imageio package, but this is not a problem as it is part of the standard library. That ImageInputStream interface is unfortunately named really. What did you mean with your last sentence? That ImageInputStream isn't named good? Yes. AFAICT its methods have nothing to do with images. This interface should probably have been given a more neutral name. snip/ - does the use of serializable objects make sense? What would be more efficient: re-parsing font data all the time or re-loading serializable object representation of them? You mean the font metrics XML files? I've alwas asking me for what propose they are there. No, I don't think, we need this. I really don't want to serialize the Advanced OpenType Features! It took me already a good amount of code to parse just a bit of it. What I meant was to use the java.io.Serializable interface. I don’t indeed think XML representations are any useful, apart maybe for debugging purpose or to have a more human-readable version of the font file. IIC there would be next to nothing to do to cache Serializable objects on the hard drive and retrieve them? Hmmm. Ok. But if we want to use Serializable for that, your classes have to be very stable. Versioning the Serializable stuff is a real burden in my opinion. So we will need a cache which detects version changes and invalidate the objects if so. Do you know such a lib? I was thinking that just catching the InvalidClassException when reading the object would be enough to conclude that the cache is no longer valid and must be re-created. Maybe I’m wrong? I must confess that I have no experience with serialization. Yes this could work. But I find it always difficult and time consuming to design classes for serialization. And reading the serialized version is most likely not much faster than reading the actual OpenType file. So I would really want to wait until we have a real performance problem. Sure. Nothing wrong with that. Thanks, Vincent
Re: Confused about checkstyle use
Hi Alexander, Alexander Kiel wrote: Hi Max, First, I will respect every code style of FOP. Its just a matter of discussion. Really? That means commenting every public method even simple Getters and Setters? Yes. Simple Getter and Setters are the only place where you can publicly document private variables. (in most cases, comment in the getter and link from the setter) Yes thats right. But is this Javadoc better than no Javadoc? public class Person { /** * Returns the first name of this person. * * @returns the first name of this person. */ public String getFirstName() { return firstName; } } Except in the simplest cases like that one, there is always a bit of additional information that can be added about the variable or its usage. Commenting equals(), hashCode() and toString()? I think, this would be only clutter. /** {...@inheritdoc} */ In my eyes this is enough clutter. I saw classes in FOP with maybe 10 methods using this /** {...@inheritdoc} */. It just distracts the eye from ready the actual method name. And it adds absolutely no information for the source code reader. That one is indeed there only to make Checkstyle happy. The Javadoc tool is able to retrieve by itself the javadoc from the redefined method (Eclipse as well). I wish Checkstyle could do that too. We will be able to partially solve that when switching to Java 1.5, by using the @Override annotation. Should the rule be disabled because of that? Having proper javadoc on at least public methods is very important. OTOH, this is actually not something Checkstyle can verify. How many methods in the code base have totally useless comments that are there just to avoid a Checkstyle warning... I think I’d prefer to keep the rule, but wouldn’t veto its removal. would do the trick on those, UNLESS they implement something which is unexpected (such as the equals methods I recently renamed which did not implement equals) or special (a toString which creates a guaranteed parsable result for example) Hmmm. A equals method shouldn't do anything unexpected. But your toString() example is a good one. If such standard methods do something more as the comment in Object says, that a comment is useful. I think it's the same as on simple public methods like the getter from above. If your comment doesn't say anything more than the method name says already, I don't want to read it. Best Regards Alex Vincent
Re: Confused about checkstyle use
Hi Vincent, Should the rule be disabled because of that? Having proper javadoc on at least public methods is very important. OTOH, this is actually not something Checkstyle can verify. How many methods in the code base have totally useless comments that are there just to avoid a Checkstyle warning... I think I’d prefer to keep the rule, but wouldn’t veto its removal. I don't vote for removal too, I only vote for the right to violate it in cases one can't add any useful information in the comment. Best Regards Alex signature.asc Description: This is a digitally signed message part
RE: Questionable whether font-shorthand grammar LL(1)
I agree - in this case - tokenizing - lexical analysis - is more difficult than parsing. Best Regards, Jonathan -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Wednesday, September 30, 2009 6:25 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Questionable whether font-shorthand grammar LL(1) Thanks everyone for your parser suggestions. I believe we should be able to do without one for the font shorthand, but this is definitely something to keep in mind if we want to improve the parsing of other properties. I’m starting to realise that the most difficult part is probably not so much the grammar parsing as the lexical analysis. To be continued, I guess... Vincent Laurent Caillette wrote: Hi all, I've never used SableCC or JavaCC so I cannot compare, but I'm using ANTLR a lot. ANTLR is highly customizable and has a very strong community. It's integrated development environment offers a debugger and visualization of grammar ambiguities. It's not only simple to setup and use, it also offers all the comfort you can reasonably dream of when developing grammars. Maybe that a tool like JarJar could reduce the pain of depending on one more library (with all possible conflicts that could happen to FOP users). Because code generation has some drawbacks (at least in terms of build complexity) you may be interested by JParsec, which creates parsers dynamically from pure Java code. Disclaimer: never used it. http://jparsec.codehaus.org Hope this will help you to do a reasonable choice. c. -Message d'origine- De : berger@gmail.com [mailto:berger@gmail.com] De la part de Max Berger Envoyé : mardi 29 septembre 2009 13:00 À : fop-dev@xmlgraphics.apache.org Objet : Re: Questionable whether font-shorthand grammar LL(1) Hi Vincent, 2009/9/29 Vincent Hennebert vhenneb...@gmail.com: How about specifing the grammer and using a tool such as JavaCC to generate the actual parser? This way you could focus more complete grammer and have to spend less time writing the parser. That would be the same as using ANTLR. I feel that this is a bit overkill for just parsing the font shorthand property, although that may prove to be useful for other properties that can accept complex expressions. That said, JavaCC is an interesting suggestion, I didn’t think of it. If a choice had to be made between ANTLR and JavaCC, which one would win? ANTLR: - easy to use - requires runtime linking of jar [1] (a *huge* disadvantage imo) JavaCC: - very sparse documentation - generates standalone java classes SableCC: - better documentation - LGPL (And therefore maybe not feasible, although it would only be used at compile time and not runtime) [1] http://beust.com/weblog/archives/000145.html Max
Re: LZW embedding experiment
Hello, Jeremias Maerki-2 wrote: I've written some code that can embedd a single-stripe CMYK TIFF in PDF as a proof of concept. I've done it for PDF because that was the easiest to implement. I don't want to commit that right now since it would need a lot of testing first. So in case I don't pursue this (due to other priorities) and someone else wants that code, it's available. I'd be interested in testing this for PDF output. Could you please send me the patch? Regards, Matthias Reischenbacher -- View this message in context: http://www.nabble.com/LZW-embedding-experiment-tp25635491p25685400.html Sent from the FOP - Dev mailing list archive at Nabble.com.
RE: Support for Arabic in FOP
Quoting Prakash sen prakash@gmail.com: Hi, I am not sure on the licensing part as sebastian did some changes in FOP code and he provided me the jars. And as per what i had checked those jar print arabic correctly. Possibly he will only be able to answer and I am nots ure whether the change was made keeping FOP standards. He was planning to do bidi algorithm, no idea whether he worked on it later and whether he contribuited the below change to FOP. He did not commit any change to FOP. Simon This message was sent using IMP, the Internet Messaging Program.