RE: HTML content in XML
This message should not be on fop-dev but on fop-user. It is not a development issue but a question about how to use FOP. You need to write XSLT to transform the HTML tags so that they become XSL-FO tags. If the HTML is not XHMTL you need to scrub it so it is valid XML. This is a preprocessing step. Writing scrubbers and transformers is a moderately complex task. Regards, Jonathan From: bhanu617 [mailto:bhanu...@gmail.com] Sent: Tuesday, December 10, 2013 11:30 AM To: fop-dev@xmlgraphics.apache.org Subject: HTML content in XML Hi, I am using FOP to generate PDF. In my application user enters data in rich text editor hence XML data will have HTML tags like ,, . But HTML tags are not effecting in the generated PDF. Please help me... Regards, Bhanu Chandar Rao A View this message in context: HTML content in XMLhttp://apache-fop.1065347.n5.nabble.com/HTML-content-in-XML-tp39762.html Sent from the FOP - Dev mailing list archivehttp://apache-fop.1065347.n5.nabble.com/FOP-Dev-f18203.html at Nabble.com.
RE: Adding a new layout manager
I found the following essay in Knuth’s Digital Typography informative: “Breaking Paragraphs Into Lines”. HTH, Jonathan From: Glenn Adams [mailto:gl...@skynav.com] Sent: Thursday, June 20, 2013 10:11 AM To: FOP Developers Subject: Re: Adding a new layout manager On Thu, Jun 20, 2013 at 9:56 PM, sdridi sdr...@iptech-group.commailto:sdr...@iptech-group.com wrote: Glenn Adams-2 wrote I would suggest you not just read code but run it with Eclipse or NetBeans to trace the execution process. That is one of the best ways to learn actual code behavior. Yes of course, debugging is my only way to break FOP mystery Glenn Adams-2 wrote Maybe that somebody is you! :) That would be an honor, but not before I master how FOP works Back to my main topic of discussion, if anyone can shed some light on FOP layout engine, I'd be very grateful. You can start by reading [1]. Then, if you are really dedicated and want to delve further, read the relevant parts of TeX: The Program [2]. Or, if you prefer to read Lisp (Scheme), then you can find a faithful transcription of the TeX line breaker at [3], which I wrote in 1990 or so. Once you've internalized this information, you are ready to tackle the FOP line breaker. Good luck! Glenn [1] http://bowman.infotech.monash.edu.au/~pmoulder/line-breaking/knuth-plass-breaking.pdf [2] http://yaojingguo.blogspot.com/2009/02/produce-tex-program-from-texweb.html [3] http://people.apache.org/~gadams/random/tex.scm.txt -- View this message in context: http://apache-fop.1065347.n5.nabble.com/Adding-a-new-layout-manager-tp38757p38766.html Sent from the FOP - Dev mailing list archive at Nabble.com.
RE: Two emails for every Jira issue update
Yes. Same problem. -Original Message- From: Alexios Giotis [mailto:alex.gio...@gmail.com] Sent: Friday, December 14, 2012 9:37 AM To: fop-dev@xmlgraphics.apache.org Subject: Two emails for every Jira issue update Hi, I receive two identical emails for every Jira issue update. From the email headers, I can see that they are coming from the fop-dev mailing list. Anybody else having this problem ? Thanks, Alexis Giotis
RE: 1.1 Release (was Vacation)
I misunderstood the implications. Thanks for the clarification. Kind Regards, Jonathan -Original Message- From: Chris Bowditch [mailto:bowditch_ch...@hotmail.com] Sent: Wednesday, September 05, 2012 9:49 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: 1.1 Release (was Vacation) On 05/09/2012 14:29, Jonathan Levinson wrote: We have customers who make heavy use of FOP TIFF. There are situations where TIFF generation is a requirement. Sure, and I didn't suggest otherwise. TIFF generation works in most scenarios. Only JPEG compression is broken. If your clients are using that then surely you would have reported the bug before now. I stand by my opinion that this is not a blocker for the 1.1 release. Thanks, Chris Kind Regards, Jonathan -Original Message- From: Chris Bowditch [mailto:bowditch_ch...@hotmail.com] Sent: Wednesday, September 05, 2012 9:14 AM To: fop-dev@xmlgraphics.apache.org Cc: priv...@xmlgraphics.apache.org Subject: 1.1 Release (was Vacation) On 05/09/2012 13:55, mehdi houshmand wrote: Hi All, Apart from my initial e-mail there's nothing private in this e-mail thread, so moving the discussion to fop-dev. Bugzilla#53790 applies to FOP1.1. It's a blocking point if you're working with TIFF, do you want me to create an analogous commit for 1.1? I haven't had the time to apply it, now seems like a good opportunity to ask whether I should. I don't believe that is a blocker to release. There are plenty of other compression types that do work. Thanks, Chris
RE: DO NOT REPLY [Bug 52513] [PATCH] Moving FOUserAgent to the constructor of Renderers
Hi Mehdi, I'm trying to assess the impact on our code if any. It seems we just use the FOPFactory and don't directly construct a renderer so this change will have *no* impact on our code. The factory to get a new Fop already has foUserAgent in the constructor, so nothing has changed in this regard, for example: Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out); I've examined your diff, and unless I'm mistaken nothing in it impacts our client-server that accepts requests for FOP rendering via TCP/IP. You aren't changing how the FOPFactory is instantiated or how the FOP factory is used to instantiate Fops. I can't assess the impact on other people's applications but the impact on our application seems non-existent. Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com http://www.intersystems.com/summit2012/ -Original Message- From: bugzi...@apache.org [mailto:bugzi...@apache.org] Sent: Wednesday, January 25, 2012 6:19 AM To: fop-dev@xmlgraphics.apache.org Subject: DO NOT REPLY [Bug 52513] [PATCH] Moving FOUserAgent to the constructor of Renderers https://issues.apache.org/bugzilla/show_bug.cgi?id=52513 --- Comment #2 from Mehdi Houshmand med1...@gmail.com 2012-01-25 11:19:00 UTC --- Since there doesn't seem to be any strong opposition to this change, I'll start a lazy consensus (i.e. apply the patch in 72hrs henceforth) to allow people to discuss any objections they may have. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
RE: svn commit: r1234877 - in /xmlgraphics/fop/trunk: examples/embedding/java/embedding/ examples/embedding/java/embedding/atxml/ src/java/org/apache/fop/cli/ src/java/org/apache/fop/render/ src/java/
I have no vote, but I’m not happy with the change since it will break our RenderServer.jar which is a client-server we use for giving rendering tasks to FOP. In the worst case, we will have to build two versions of the jar – one for fop 1.0 and the other for people who are using trunk. This is far from ideal. A software contract is a software contract. Once it has been set, and is in the field. it should be adhered to unless there are powerful overriding reasons that compel a change. Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 From: Glenn Adams [mailto:gl...@skynav.com] Sent: Monday, January 23, 2012 1:22 PM To: fop-dev@xmlgraphics.apache.org Cc: mehdi Subject: Re: svn commit: r1234877 - in /xmlgraphics/fop/trunk: examples/embedding/java/embedding/ examples/embedding/java/embedding/atxml/ src/java/org/apache/fop/cli/ src/java/org/apache/fop/render/ src/java/org/apache/fop/render/awt/ src/java/org/apache/fop I'm disturbed that such a change has been committed without a public discussion of the merits, risks, etc., of making such a breaking change. Please revert this commit, conduct a public discussion, then, based on the results, implement the consensus. I have no idea if there is a consensus for this change without having a discussion. Regards, Glenn On Mon, Jan 23, 2012 at 9:15 AM, me...@apache.orgmailto:me...@apache.org wrote: Author: mehdi Date: Mon Jan 23 16:15:23 2012 New Revision: 1234877 URL: http://svn.apache.org/viewvc?rev=1234877view=rev Log: Moved the FOUserAgent into the constructor of the Renderers This breaks the public API but for good reasons: 1) the user-agent is essential for configuring the renderers 2) instantiation of the constructor is always followed by call to setUserAgent() (in the examples) 3) simplifies the API and reduces mutability of the Renderers
RE: Why do you use .cmd rather than .bat?
Hi Simon, I tested the following fop.cmd file on Windows 7 and it works. It may lack support for some earlier Windows versions, such as Windows 98. I copy fop-dev so my little command file can be subjected to contributor review: Here are the contents of fop.cmd. The Apache script had tests for %OS%=Windows_NT, which is true on Windows 7. I don't have a non-NT Windows system on which to test the script, so I removed those tests, such as if %OS%==Windows_NT set LOCAL_FOP_HOME=%~dp0. I won't put in code I can't test. - beginning of script fop.cmd - @ECHO OFF set LOCAL_FOP_HOME=%~dp0 set FOP_CMD_LINE_ARGS=%1 if %1== goto doneStart shift :setupArgs if %1== goto doneStart set FOP_CMD_LINE_ARGS=%FOP_CMD_LINE_ARGS% %1 shift goto setupArgs rem This label provides a place for the argument list loop to break out :doneStart call %LOCAL_FOP_HOME%\fop.bat %FOP_CMD_LINE_ARGS% - end of script Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com -Original Message- From: Simon Pepping [mailto:spepp...@leverkruid.eu] Sent: Monday, December 05, 2011 1:53 PM To: fop-us...@xmlgraphics.apache.org Subject: Re: Why do you use .cmd rather than .bat? To avoid code duplication, is it possible to have fop.cmd say something like 'call fop.bat'? Can you test that? I have no computer with Windows available. Or vice versa. Which is the canonical name, bat or cmd? Simon On Sun, Dec 04, 2011 at 04:17:37PM -0500, Jonathan Levinson wrote: A copy named fop.bat would be very useful to us. We currently have most sites deployed on fop 1.0, which names the command script on Windows as fop.bat. For Skynav fop, for our Middle Eastern sites, we are contemplating (after suitable QA) replacing fop 1.0 with Skynav fop. However, some of these sites will be on versions of our product which do assume the name of the script on Windows is fop.bat. While, we can tell users to rename fop.cmd to fop.bat, it will simplify configuration if fop ships with a fop.bat. - To unsubscribe, e-mail: fop-users-unsubscr...@xmlgraphics.apache.org For additional commands, e-mail: fop-users-h...@xmlgraphics.apache.org
RE: Merge Request - Temp_ComplexScripts into Trunk
Hi Simon, I've contacted my management and asked what our teams can do to help test. I report to our development not to our quality departments, and I can't speak for our quality departments. I've contacted our international teams about what they can do to help test. The bottom-line to our testing is that when a new FOP is released, we test our software on the new FOP, and testing is done with every application that uses FOP. With an international FOP, international applications would be tested. That much is guaranteed by our normal testing process. It is a tribute to the quality of FOP that we have never had to report a FOP issue, even though our reports can be quite complicated. But I take it you would like us to get involved with the testing effort before a new FOP is released. When you are discussing our involvement, are you discussing our testing the FOP that results from the merger onto the trunk, once that is accomplished? I understand you are ironing out the details of what that merger would look like. You said bug reports should go to fop-users, but isn't it the case that .fo attachments won't be accepted by fop-users? Don't bug reports have to be created through Bugzilla? We can discuss what we see in terms of bugs on fop-users, but if we can't provide .fo files, won't our discussion be less helpful? Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com -Original Message- From: Simon Pepping [mailto:spepp...@leverkruid.eu] Sent: Thursday, October 20, 2011 3:19 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk Jonathan, Obviously, FOP's strongest supporters over the past years do not require this new functionality. FOP needs the additional support of new stakeholders of this new functionality. Could your teams test it on their documents and report their findings to the fop-user email list? Simon Pepping On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote: We -- at InterSystems -- deploy an application that runs in upwards of 40 countries, using many of the languages for which complex script support is required. We definitely need complex script support. It is a requirement for us.
RE: Merge Request - Temp_ComplexScripts into Trunk
We -- at InterSystems -- deploy an application that runs in upwards of 40 countries, using many of the languages for which complex script support is required. We definitely need complex script support. It is a requirement for us. Thanks, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com -Original Message- From: Simon Pepping [mailto:spepp...@leverkruid.eu] Sent: Wednesday, October 19, 2011 2:32 PM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk Over the past ten years computing has pervaded life in all its facets, and spread over the world. As a consequence computing is now used in all languages and all scripts. When I open my devanagari test file in emacs, it just works. When I open it in firefox, it just works. The same when I open it in LibreOffice Writer. I am sure that, if I would open it in *the* *Word* processor, it would just work. When I process the file with FOP, it does not work. With the complex scripts functionality, it works, dependent on the use of supported or otherwise suitable fonts. (That is true for all above applications, but apparently those come configured with my system.) So what does a person do who believes in the XML stack to maintain his documentation, and wants to send his documents in Hindi to his customers? See that XSL-FO with FOP is not a suitable solution for him because Hindi uses a complex script? FOP needs the complex scripts functionality to remain a player in the global playing field. This is for me the single overarching consideration to want this functionality in FOP's trunk code, and in, say, half a year in a release. All other considerations are minor, unless one wants to claim that this code will block FOP's further development and maintenance in the coming years. Of course, not everybody needs this functionality, and there is a fear of increased maintenance overhead. But the question is: For whom do we develop FOP? Also for the large part of the world that uses complex scripts? With the development of the complex scripts functionality, Glenn Adams and his sponsor Basis Technologies have created a new reality, which is not going to go away. If this functionality does not end up in FOP, it will probably live on elsewhere. If the FOP team is fine with that, say no to the merge request, and feel comfortable with a trusted niche application. Simon Pepping On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote: On 18/10/2011 19:55, Simon Pepping wrote: I merged the ComplexScripts branch into trunk. Result: Hi Simon, As well of the question of how to do the merge there is also the question should we do the merge? Of course this is a valuable feature to the community, and Glenn has invested a lot of time in its development but is it truely production ready? I asked Vincent to take a look at the branch earlier in the year as it's a feature we also need, but he had several concerns that have not be adequately answered. Take a look at comment #30; https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30 I'm not sure why Vincent describes it as a brief look because he spent several days on it. I also asked Peter to take a look and he had similar concerns. 2 or 3 letter variable names are a barrier for any committer wanting to maintain this code and I don't think it is a sufficient argument to say that a pre-requisite to maintaining this code is to be a domain expert. I would hope that any experienced committer with a debugger should be able to solve some bugs. Obviously certain problems will require domain expertise, but the variables names are a key barrier to being able to maintain this code. I realise my comments might be a little controversial and I don't mean any disrespect to Glenn or his work (which is largely excellent), but we should at least discuss these topics before the merge is completed.
RE: DO NOT REPLY [Bug 51984] Complex script version of FOP goes into infinite loop
Our team switched to the version at https://github.com/skynavga/fop and did not run into the issue. Our thanks to Glenn Adams for his extremely helpful critique of our .fo file. Specifying writing-mode solved an issue for us. Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems -Original Message- From: bugzi...@apache.org [mailto:bugzi...@apache.org] Sent: Monday, October 10, 2011 11:01 AM To: fop-dev@xmlgraphics.apache.org Subject: DO NOT REPLY [Bug 51984] Complex script version of FOP goes into infinite loop https://issues.apache.org/bugzilla/show_bug.cgi?id=51984 --- Comment #4 from Matthias Reischenbacher matthias8...@gmx.at 2011- 10-10 15:01:09 UTC --- This could be related to: https://issues.apache.org/bugzilla/show_bug.cgi?id=51282 which was actually a bug introduced in trunk. Probably you are using an older version of the complex script branch, where the fix hasn't been merged yet. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Adding support for Arabic to FOP
Using PDF Vole to examine the PDF files and reading the PDF specification, I've come to the conclusion that PDF renderers such as Adobe Acrobat version 9.0 do not implement BIDI or glyph form shaping. Rather it is the job of the software that produces the PDF to ensure that the glyph drawing commands in left to right order pick the right glyphs and implement BIDI and paragraph line-breaking appropriately. Glyph form shaping (the choosing of the right glyphs for initial, intermediate and final forms) also has to be done by the software that produces the PDF. This means that adding Arabic support to FOP will involve at least the following areas: 1) Change of layout manager to support line-breaking on the left. 2) Integration of line-breaking in layout manager with BIDI algorithm to appropriately handle quoted strings. 3) Automatic recognition of right-to-left direction from UNICODE font rather than specifying text direction through writing-mode. 4) Implementation of form shaping glyph chooser in PDF rendering engine. 5) In PDF rendering engine - rearranging texts in Tj command so that they correspond to BIDI algorithm. Rearrangement has to be done on a character by character basis for instance Arabic text (rendered right to left) can contain numbers (rendered left to right) or quote French (rendered left to right). Best Regards, Jonathan Levinson
RE: Regular expression use
From the following link, it looks like we can call the Lexer to get tokens - independently of the parser. http://www.antlr.org/wiki/display/ANTLR3/1.+Lexer Here is the example from the above which gives me such a hope: import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class MainLexer { public static void main(String[] args) { CharStream input = new ANTLRFileStream(args[0]); XMLLexer lexer = new XMLLexer(input); Token token; while ((token = lexer.nextToken())!=Token.EOF_TOKEN) { System.out.println(Token: +token.getText()); } } catch(Throwable t) { System.out.println(Exception: +t); t.printStackTrace(); } } } I don't know if CharStream or XMLLexer can take a String constructor or has a String factory, which is what we'd probably use within FOP. Best Regards, Jonathan S. Levinson -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Thursday, October 08, 2009 5:15 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Regular expression use Hi Jonathan, Jonathan Levinson wrote: I'm sure someone has mentioned it already but what about the lexer support in ANTLR? http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Lexical+analysis ANTLR is available under the BSD license, which seems to be one with no strings attached: http://www.antlr.org/license.html Basically we’re back to the same discussion as about the parser generator, this time at the lexer level. http://markmail.org/thread/64rmyl7x4nyoxhh3 Among the tools mentioned in the above thread, it would be good to know which ones allow to use the lexer independently of the parser. Unless we decide to use both the lexer and parser anyway... Vincent Best Regards, Jonathan S. Levinson -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Wednesday, October 07, 2009 6:51 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Regular expression use Hi Jonathan, Jonathan Levinson wrote: I noticed that if one is not careful in one's regular expression use, the compilation for a regular expression can take minutes. I'm not talking about applying the pattern just compiling it! Should regular expressions be avoided altogether and should one use hand-crafted state machines for parsing, and tokenizing, or can regular expressions be used as long as one is careful? I’d say, use regular expressions as long as they are not too complex. But I guess you’re mentioning that in the context of property parsing, in which case I don’t think regular expressions are the ultimate answer. A proper lexer is likely to be needed, either generated or written by hand. As the latter solution quickly becomes a maintenance nightmare, some lexer generator will probably be needed. Question remains, which one, and I’m not even sure there’s one that exists whose license is ASLv2-compatible. Plus there are some issues specific to property parsing, like shorthands (which should ideally re-use the parsers of the individual properties), sub-properties, etc. Vincent
RE: Regular expression use
I'm sure someone has mentioned it already but what about the lexer support in ANTLR? http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Lexical+analysis ANTLR is available under the BSD license, which seems to be one with no strings attached: http://www.antlr.org/license.html Best Regards, Jonathan S. Levinson -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Wednesday, October 07, 2009 6:51 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Regular expression use Hi Jonathan, Jonathan Levinson wrote: I noticed that if one is not careful in one's regular expression use, the compilation for a regular expression can take minutes. I'm not talking about applying the pattern just compiling it! Should regular expressions be avoided altogether and should one use hand-crafted state machines for parsing, and tokenizing, or can regular expressions be used as long as one is careful? I’d say, use regular expressions as long as they are not too complex. But I guess you’re mentioning that in the context of property parsing, in which case I don’t think regular expressions are the ultimate answer. A proper lexer is likely to be needed, either generated or written by hand. As the latter solution quickly becomes a maintenance nightmare, some lexer generator will probably be needed. Question remains, which one, and I’m not even sure there’s one that exists whose license is ASLv2-compatible. Plus there are some issues specific to property parsing, like shorthands (which should ideally re-use the parsers of the individual properties), sub-properties, etc. Vincent
Regular expression use
I noticed that if one is not careful in one's regular expression use, the compilation for a regular expression can take minutes. I'm not talking about applying the pattern just compiling it! Should regular expressions be avoided altogether and should one use hand-crafted state machines for parsing, and tokenizing, or can regular expressions be used as long as one is careful? Best Regards, Jonathan S. Levinson
RE: Questionable whether font-shorthand grammar LL(1)
I agree - in this case - tokenizing - lexical analysis - is more difficult than parsing. Best Regards, Jonathan -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Wednesday, September 30, 2009 6:25 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Questionable whether font-shorthand grammar LL(1) Thanks everyone for your parser suggestions. I believe we should be able to do without one for the font shorthand, but this is definitely something to keep in mind if we want to improve the parsing of other properties. I’m starting to realise that the most difficult part is probably not so much the grammar parsing as the lexical analysis. To be continued, I guess... Vincent Laurent Caillette wrote: Hi all, I've never used SableCC or JavaCC so I cannot compare, but I'm using ANTLR a lot. ANTLR is highly customizable and has a very strong community. It's integrated development environment offers a debugger and visualization of grammar ambiguities. It's not only simple to setup and use, it also offers all the comfort you can reasonably dream of when developing grammars. Maybe that a tool like JarJar could reduce the pain of depending on one more library (with all possible conflicts that could happen to FOP users). Because code generation has some drawbacks (at least in terms of build complexity) you may be interested by JParsec, which creates parsers dynamically from pure Java code. Disclaimer: never used it. http://jparsec.codehaus.org Hope this will help you to do a reasonable choice. c. -Message d'origine- De : berger@gmail.com [mailto:berger@gmail.com] De la part de Max Berger Envoyé : mardi 29 septembre 2009 13:00 À : fop-dev@xmlgraphics.apache.org Objet : Re: Questionable whether font-shorthand grammar LL(1) Hi Vincent, 2009/9/29 Vincent Hennebert vhenneb...@gmail.com: How about specifing the grammer and using a tool such as JavaCC to generate the actual parser? This way you could focus more complete grammer and have to spend less time writing the parser. That would be the same as using ANTLR. I feel that this is a bit overkill for just parsing the font shorthand property, although that may prove to be useful for other properties that can accept complex expressions. That said, JavaCC is an interesting suggestion, I didn’t think of it. If a choice had to be made between ANTLR and JavaCC, which one would win? ANTLR: - easy to use - requires runtime linking of jar [1] (a *huge* disadvantage imo) JavaCC: - very sparse documentation - generates standalone java classes SableCC: - better documentation - LGPL (And therefore maybe not feasible, although it would only be used at compile time and not runtime) [1] http://beust.com/weblog/archives/000145.html Max
RE: Questionable whether font-shorthand grammar LL(1)
Hi Vincent, Excellent ideas! The diagram you drew is extremely useful! If the font shorthand sub-language has a grammar that is regular then it also has a grammar that is LL(1). So recursive descent parsing will work, if there is a regular grammar. I think the best way of getting font shorthand to work would proceed in stages: 1) First get the current code to properly parse and accept valid font shorthand expressions. This should be very easy. The one remaining problem (AFAIK) is the parsing of font-size/line-height where /line-height is optional. Currently spaces are not allowed around the slash / and they should be. I'm going to try to get to this problem as soon as I have time, probably in a day or so. 2) Evaluate which parser or automaton approach is the simplest and produces better error states than the current approach. 3) Implement the approach one has chosen in (2). Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600 -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Monday, September 28, 2009 8:13 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Questionable whether font-shorthand grammar LL(1) Hi Jonathan, Interesting stuff! Jonathan Levinson wrote: Hi Vincent, snip/ Because font-variant font-style and font-weight can occur in any order, I could not (currently) come up with a grammar in which the directing sets were disjoint for each non-terminal. So I was unable to come up with an LL(1) grammar. For instance, here are two productions of my attempt at a grammar: style-variant-weight - variant-weight style-variant-weight - variant-style In each case, the first set of style-variant-weight shares a common element in two different productions, the literal values for variant. One needs to look ahead one more token to see if one has a variant-weight or a variant-style. (I’ll call “modifier” any of the three style, variant, weight properties.) Taking the ‘normal’ case apart, and since ‘inherit’ is not allowed in the shorthand, I think the values for all modifiers are distinct: ‘italic’, ‘oblique’, ‘backslant’ for font-style, ‘small-caps’ for font-variant, and the various weight values for font-weight. Since all modifiers are set to their initial values prior to the shorthand parsing, which is ‘normal’ for all three of them, I think we can simply ignore any ‘normal’ value found in the string. That is, accept it as a legal terminal but not do anything. So I don’t think there is any ambiguity any more. What remains to be done is to check that the same modifier is not specified more than once (that includes checking that ‘normal’ is not specified more than 3 times). And it’s probably easier to check that at the semantic level instead of crafting special grammar rules. snip/ The books and web articles I read only discussed using recursive descent when the grammar is LL(1). I have the feeling that despite the ambiguities in the grammar it is almost LL(k) because font-variant and font-style and font-weight almost have disjoint values. It is at least LL(3) and I suspect it is LL(6). The font-size property has the good idea of not allowing ‘normal’ as a value. The ‘normal’ case for modifiers can be ignored as explained above. So I think the grammar still is LL(1) snip/ I'm not as convinced as you are that recursive descent parsing or a formal bottom-up-parser will make the code simpler rather than more complex because of the complexities of a formal grammar. Of course, however complex the grammar, a table-generating tool - like ANTLR - will generate code, however complex, which will faithfully reflect the inputted grammar. However, none of the other properties in FOP use a table-generating tool like ANTLR - and I'm not sure what the consequences would be to FOP of introducing such a tool. Given the complexities of the grammar, I'm sure that a recursive descent parser will be quite complex, and if we are going to use a grammar driven approach we would be better off with a tool that generates parsers from grammars rather than the recursive descent approach. Also an advantage of parser generators is that one doesn't have to rewrite so much code to correct a mistake in one's grammar, if one makes a mistake, or if the grammar changes. Recursive descent parsing can pose its own maintenance nightmares. Using a grammar tool like ANTLR is probably overkill to parse just a shorthand property. Moreover the grammar is not likely to change, so that reduces its usefulness even more. That said, most properties can accept expressions, where such a tool might actually be interesting. I don’t know how far FOP goes to supporting expressions in other properties. The current approach in FOP for font-shorthand is obscurely written but strikes me as basically sound. 1) One parses from right-to-left using
Confused about checkstyle use
I've installed the Checkstyle plugin for IDEA and the current code when scanned by the plugin shows lots of Checkstyle errors. Here are some errors scanning BlockStackingLayoutManager.java: Missing package-info.java file (0:0) Line is longer than 80 characters. (18:0) First sentence should end with a period (53:0) Variable 'bpUnit' must be private and have accessor methods. (61:19) What does it mean to have clean code according to Checkstyle? Is my plugin misconfigured? Is it by default at too strict a setting? Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems
Questionable whether font-shorthand grammar LL(1)
Hi Vincent, I dusted off my books on parsing and compiling (also using some Web-sites to do research) and looked at writing a formal grammar for font-shorthand. Because font-variant font-style and font-weight can occur in any order, I could not (currently) come up with a grammar in which the directing sets were disjoint for each non-terminal. So I was unable to come up with an LL(1) grammar. For instance, here are two productions of my attempt at a grammar: style-variant-weight - variant-weight style-variant-weight - variant-style In each case, the first set of style-variant-weight shares a common element in two different productions, the literal values for variant. One needs to look ahead one more token to see if one has a variant-weight or a variant-style. According to Gough's Syntax Analysis and Software Tools (1988) For every production of the augmented grammar we derive a set of possible 1-lookahead symbols, which we call the director set for that production. If and only if the director sets for different productions of the same non-terminal are disjoint, i.e. have no common elements, is the grammar LL(1). Also the grammar is ambiguous as we've discussed. font - style-variant-weight size [ / line-height] family If the string starts with 'normal' and then goes on to define size and family then one isn't sure whether style or variant or weight are being specified. Somehow one needs to special case 'normal' so that when the string begins with normal - one value (say font-weight is set) and the other two are not set which according to the spec means they are reset to normal as well. The books and web articles I read only discussed using recursive descent when the grammar is LL(1). I have the feeling that despite the ambiguities in the grammar it is almost LL(k) because font-variant and font-style and font-weight almost have disjoint values. It is at least LL(3) and I suspect it is LL(6). Given your greater knowledge of parsing, do you know if an LL(k) parser can always be implemented as recursive descent if one looks k tokens ahead in one's parsing routine? I also noticed that the fact that space separates the tokens must be in an important part of any solution to the problem and that the font-shorthand is more easily parsed (by any software) from right-to-left than left-to-right. This is because font-family is not nullable and in a right-to-left parsing is the first element encountered.A non-terminal symbol is nullable if null can be validly derived from it in terms of the grammar. I'm not as convinced as you are that recursive descent parsing or a formal bottom-up-parser will make the code simpler rather than more complex because of the complexities of a formal grammar. Of course, however complex the grammar, a table-generating tool - like ANTLR - will generate code, however complex, which will faithfully reflect the inputted grammar. However, none of the other properties in FOP use a table-generating tool like ANTLR - and I'm not sure what the consequences would be to FOP of introducing such a tool. Given the complexities of the grammar, I'm sure that a recursive descent parser will be quite complex, and if we are going to use a grammar driven approach we would be better off with a tool that generates parsers from grammars rather than the recursive descent approach. Also an advantage of parser generators is that one doesn't have to rewrite so much code to correct a mistake in one's grammar, if one makes a mistake, or if the grammar changes. Recursive descent parsing can pose its own maintenance nightmares. The current approach in FOP for font-shorthand is obscurely written but strikes me as basically sound. 1) One parses from right-to-left using the fact that spaces divide tokens 2) One lets property makers determine whether they apply to a token. Each property maker is a little parser of the token one feeds it. Because the property makers determine whether they apply to a token, one can handle the fact that variant, weight and style can occur in any order by feeding the current token to each of the property makers for font-variant, font-weight, and font-style in turn. Whatever they accept is ipso-facto a font-variant or a font-weight or font-style. Just want to let you know I take the problem seriously, and I'm not trying to duck the responsibility of coming up with an adequate solution. I'm not sure what I did fits into a job priority which is why I spent many hours this weekend on this research. You are free to disagree with my observations and I notice that on fop-dev forums you challenge us to make the code simpler, more reusable, and better structured. Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600
RE: Confused about checkstyle use
Thanks to your advice (and my finding the checkstyle configurator in Idea) I'm now using checkstyle-5.0.xml from FOP. Thank you very much! However, I notice there are still warnings. BlockStackingLayoutManager.java: 16 items Missing a Javadoc comment. (58:5) 'parentArea' hides a field. (115:47) 'parentArea' hides a field. (145:50) Method length is 185 lines (max allowed is 150) (372:5) Etc., I'm using JetBrains IDEA 8.1.3. Is the rule we ignore warnings and only look for errors? BTW, I got Checkstyle to work in IDEA by changing checkstyle-5.0.xml in FOP in the following way: module name=RegexpHeader property name=headerFile value=c:/perforce/Users/levinson/fop-trunk/checkstyle.header/ Thanks again for your help!, Jonathan S. Levinson Senior Software Developer Object Group InterSystems -Original Message- From: Alexander Kiel [mailto:alexanderk...@gmx.net] Sent: Sunday, September 27, 2009 4:55 PM To: fop-dev@xmlgraphics.apache.org Subject: Re: Confused about checkstyle use Hi Jonathan, did you use the checkstyle-5.0.xml from FOP or the default SUN profile? I'm currently not able to start IDEA, but two days ago as I downloaded the plugin, I noticed that the SUn profile was active and I had to define the FOP profile. And if you define the FOP profile, you will properly notice that the header thing did not work. Its a path inclusion problem of the header.* file. I did not have a solution for it, I just commended it out for now. Best Regards Alex Jonathan Levinson wrote: I've installed the Checkstyle plugin for IDEA and the current code when scanned by the plugin shows lots of Checkstyle errors. Here are some errors scanning BlockStackingLayoutManager.java: Missing package-info.java file (0:0) Line is longer than 80 characters. (18:0) First sentence should end with a period (53:0) Variable 'bpUnit' must be private and have accessor methods. (61:19) What does it mean to have clean code according to Checkstyle? Is my plugin misconfigured? Is it by default at too strict a setting? Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems
RE: ambiguity of grammar for font shorthand?
Hi Vincent, You make excellent points, however for font-style, font-variant and font-weight the initial value (the default value) is normal, not inherit. http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#font-style http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#font-variant http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#font-weight This is a minor detail, but important if our discussion is used as the basis for building a recursive descent parser. Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600 -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Tuesday, September 22, 2009 7:20 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: ambiguity of grammar for font shorthand? Hi Jonathan, Jonathan Levinson wrote: Hi Vincent, As I read the grammar for the font shorthand it is ambiguous, though not fatally so as long as one excludes the value of inherit from individual properties in the font short hand. For instance the first optional argument is font-style, font-weight, and font-variant, each of which is optional and can occur in any order. All can have the value normal. So if the value for the font shorthand is normal 10pt Arial we do not know which of these three is being set to normal even though it is harmless and the omitted values will be set to normal since that is their initial value. Actually not: the default value is inherited. If somewhere up in the hierarchy the font-weight was set to bold, then we don’t know if that ‘normal’ in the font property means that font-weight must be reset to normal or if it applies to another property. This example you’re mentioning is truly ambiguous. If inherit is allowed to be a value then the grammar truly becomes ambiguous since each of these can have the value inherit and we don't know which ones are omitted and must take the value normal. I think it is probably the case that in the context of the font short hand - the font properties cannot take the value of inherit, since this renders the grammar irreducibly ambiguous. While such an exclusion is not mentioned in the spec, it makes sense that inherit must be excluded for the reason I've just given. Excluding inherit for good is a bit too restrictive IMO. I think we should try to resolve all non-ambiguous cases, like: normal normal bold inherit bold italic inherit inherit inherit inherit etc. Some truly ambiguous values: normal normal (which one is inherited?) normal bold inherit (which one is normal, which one inherited?) normal (which one is normal, which one inherited?) etc. A good “exercise” would be to identify all cases that are ambiguous. In which case an error would be thrown with a “the value is ambiguous”-like message. Prima facie, the grammar (eliminating inherit) looks LL(1) since parsing from left to right one can always tell what property one is parsing except for the case when one of the first three is assigned normal and there are no further values unique to the properties of the first three. In this case, one has a special rule (outside the grammar) to arbitrarily pick one of the optional properties in the first optional argument as the bearer of normal, while the rest receive their initial values of normal. Actually, a “simple” regular expression might be enough. The java.util.regex package can do wonder. See attached Java file: there will always be 6 matching groups, some of them possibly being null. The first three are for style/variant/weight, then font-size, then line-height, then font families. Some magic would have to be implemented to identify the first 3 groups. Also, the regex for the individual properties would have to be refined: “\\w+” is actually wrong for font-weight. One could imagine to re-use a regex defined for each sub-property. However, an LL parser would probably be superior in error handling. The regular expression would just fail to match, and there’s not much that can be said about why it fails. An LL parser would probably be able to tell, say, that the error lies in the declaration of the font-size property. I think a good error handling is important, especially to beginners. I’ve found myself ranting against such meaningless error messages that don’t tell you at all what your error could be. There is a special case where the value of font is inherit and that works fine. Since we are testing if the single token is inherit, we can handle that special case in a recursive descent parser. We create a tokenizer which breaks on space and see if the one token returned is inherit. Also, in your message you said we could ignore a value for font of caption, icon, etc., as the standard tells us to do, but the standard discusses these values and their relation to system fonts. Was this an oversight on your part or am I mis-reading the spec? [1
ambiguity of grammar for font shorthand?
Hi Vincent, As I read the grammar for the font shorthand it is ambiguous, though not fatally so as long as one excludes the value of inherit from individual properties in the font short hand. For instance the first optional argument is font-style, font-weight, and font-variant, each of which is optional and can occur in any order. All can have the value normal. So if the value for the font shorthand is normal 10pt Arial we do not know which of these three is being set to normal even though it is harmless and the omitted values will be set to normal since that is their initial value. If inherit is allowed to be a value then the grammar truly becomes ambiguous since each of these can have the value inherit and we don't know which ones are omitted and must take the value normal. I think it is probably the case that in the context of the font short hand - the font properties cannot take the value of inherit, since this renders the grammar irreducibly ambiguous. While such an exclusion is not mentioned in the spec, it makes sense that inherit must be excluded for the reason I've just given. Prima facie, the grammar (eliminating inherit) looks LL(1) since parsing from left to right one can always tell what property one is parsing except for the case when one of the first three is assigned normal and there are no further values unique to the properties of the first three. In this case, one has a special rule (outside the grammar) to arbitrarily pick one of the optional properties in the first optional argument as the bearer of normal, while the rest receive their initial values of normal. There is a special case where the value of font is inherit and that works fine. Since we are testing if the single token is inherit, we can handle that special case in a recursive descent parser. We create a tokenizer which breaks on space and see if the one token returned is inherit. Also, in your message you said we could ignore a value for font of caption, icon, etc., as the standard tells us to do, but the standard discusses these values and their relation to system fonts. Was this an oversight on your part or am I mis-reading the spec? [1] [1] http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#font I'm not sure we have to go to the complexity of parsing the font short hand in a recursive descent manner. I've updated the open issue (47709) to give my reasons why and a solution to the problem of more than two fonts separated by commas. The overly complex code I analyzed looks to me like a tokenizer not a parser, and while it could be better written (and more understandable) it seems to be doing an adequate job of tokenizing, unless I'm still missing something. Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600
RE: Support for Arabic in FOP
Thank you for your kind offer. What license applies to the jars? Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600 -Original Message- From: Prakash sen [mailto:prakash@gmail.com] Sent: Friday, September 18, 2009 1:13 PM To: fop-dev@xmlgraphics.apache.org Subject: RE: Volunteering to work on FOP development Simon Pepping @ Home wrote: Arabic support is very important for us. I looked in nabble for Sebastian's post that would allow FOP to work with Arabic, but was unable to find his post, though I found a reference to it, with a no longer valid hyper-link. This is a link to his message: http://markmail.org/message/qu534sfte3xaaosb#query:sebastian%20weber%20a rabic+page:1+mid:tbj7vyt56wim4bfj+state:results, but the link to his work is no longer valid. Kia Teymourian also worked on Arabic support in FOP, and his work is available at http://user.cs.tu-berlin.de/~kiat/fop/. Regards, Simon Hi, I am not sure on what changes sebastian had made in the source code, but i do have the jar files. If needed i can send them with some sample example. Regards, Prakash Sen. -- View this message in context: http://www.nabble.com/Volunteering-to-work-on-FOP-development-tp25442059 p25512296.html Sent from the FOP - Dev mailing list archive at Nabble.com.
RE: Volunteering to work on FOP development
Arabic support is very important for us. I looked in nabble for Sebastian's post that would allow FOP to work with Arabic, but was unable to find his post, though I found a reference to it, with a no longer valid hyper-link. Do you have a valid link to a patch and/or instructions that will enable FOP to work with Arabic? Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600 -Original Message- From: Prakash sen [mailto:prakash@gmail.com] Sent: Thursday, September 17, 2009 2:55 PM To: fop-dev@xmlgraphics.apache.org Subject: Re: Volunteering to work on FOP development Pascal Sancho wrote: Jonathan Levinson a écrit : We are an international company, and need to support non-Western documents including Greek, Thai and Chinese, amongst many others. We have technical people at work in every area of the globe. Best Regards, Jonathan S. Levinson -Original Message- From: Simon Pepping [mailto:spepp...@leverkruid.eu] Another area where FOP needs more work is support for non-Western documents. I do not know where the problems are, but it probably does not work right now. Ideally, we would have contributors from regions with such problems. Regards, Simon Hi Jonathan, Regarding non-western documents, in current FOP version: - Latin and Slave alphabets give kind results, since they only depend on fonts charset (my own experience). - Japanese can give expected result (my own experience). - ideographics languages (like Chinese) can give unexpected results regarding line-breakink (from what I read in user-list). - Arabic (and probably all right-to-left writings) cannot be handled correctly by current FOP version: - right to left not implemented (see [1]) - Arabic inner ligatures not handled at all [1] http://xmlgraphics.apache.org/fop/compliance.html#fo-property-writingmode-section HTH, Pascal Hi, There was some change made by sebastian in FOP for arabic characters and it worked for us, I used it around 2006 - 07 and it is still workin properly.. I beleive the post should be in nabble only. Regards, Prakash Sen. -- View this message in context: http://www.nabble.com/Volunteering-to-work-on-FOP-development-tp25442059p25492225.html Sent from the FOP - Dev mailing list archive at Nabble.com.
RE: Volunteering to work on FOP development
Yes, I've read the FOP Coding Guidelines [1]. I have an SVN client (Tortoise SVN) and IDE (InteillJ IDEA) set up. Thanks for the advice on where to get started! I'm looking at [3] http://issues.apache.org/bugzilla/show_bug.cgi?id=47709, which is the failure of the amendment to the font-shorthand-test, and which you say looks like a Properties bug. Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600 -Original Message- From: Chris Bowditch [mailto:bowditch_ch...@hotmail.com] Sent: Tuesday, September 15, 2009 4:06 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Volunteering to work on FOP development Jonathan Levinson wrote: Hi, Hi Jonathon, My management has asked me to volunteer to help fix FOP bugs and add FOP enhancements. I'm not yet familiar with FOP internals though I've read your design documents. Good news indeed. FOP is short on development resources. I take it you have read the FOP Coding Guidelines [1] and got SVN client and Java IDE setup? I work for InterSystems: www.intersystems.com http://www.intersystems.com http://www.intersystems.com . I'm responsible for the InterSystems reporting engine: ZEN Reports. ZEN Reports generates XSLT to transform XML to XSL-FO and uses RenderX XEP, FOP, and Antenna House for rendering engines. I have to start somewhere and a question I have is this: what would be a good starter bug or enhancement for me to work on? Can anyone give me any pointers on how to get started? The trick for a newbie is to avoid the layout engine. Still there are plenty of bugs in the Renderers/Painters, FO Tree or Properties component. I had a quick flick through bugzilla and found [2] which may be an FOTree related issue and [3] which looks like a Properties bug [1] http://xmlgraphics.apache.org/fop/dev/conventions.html [2] http://issues.apache.org/bugzilla/show_bug.cgi?id=47835 [3] http://issues.apache.org/bugzilla/show_bug.cgi?id=47709 Thanks, Chris Thanks! Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600
Volunteering to work on FOP development
Hi, My management has asked me to volunteer to help fix FOP bugs and add FOP enhancements. I'm not yet familiar with FOP internals though I've read your design documents. I work for InterSystems: www.intersystems.com http://www.intersystems.com . I'm responsible for the InterSystems reporting engine: ZEN Reports. ZEN Reports generates XSLT to transform XML to XSL-FO and uses RenderX XEP, FOP, and Antenna House for rendering engines. I have to start somewhere and a question I have is this: what would be a good starter bug or enhancement for me to work on? Can anyone give me any pointers on how to get started? Thanks! Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600