Re: [VOTE] Applying the Type 1 subset patch
On 17/03/14 01:19, Luis Bernardo wrote: I performed some further tests, still on Mac, but with a couple of ghostscript type1 fonts, which are probably the same one finds in Linux. The test was successful in that the output looked good (for the record I has some unpredictable output between different runs which I could not reliably reproduce so I attribute that to an environment issue, maybe the .fop directory). My example included characters not present in the font. Instead of # for the missing glyph I got z (see example attached), which probably is not intended (i.e., looks like a bug). I was also expecting that Adobe would indicate that the fonts are subset but it doesn't but this could be a wrong expectation (the this is probably because the font’s PostScript name doesn’t start with a subset tag (6 uppercase letters followed by a +) like it should. Also, it may be necessary to add a CharSet entry to the font descriptor. subset file is nevertheless considerably smaller -- 64KB versus 219 KB). Finally I ran a simple performance test. With the patched code (that produces subset) the time was 175 msecs. With the current trunk 83 msecs. So I think the suggestion that Vincent put forward to not make subset the default for type1 makes sense for now. I think this requires a new vote with a new patch. Vincent On 3/12/14, 12:06 AM, Luis Bernardo wrote: Since apparently Macs have no type1 fonts I had to look for some and I tried the first one from http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma (cmb10) which gave a problem: java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:907) at java.util.Scanner.next(Scanner.java:1530) at java.util.Scanner.nextInt(Scanner.java:2160) at java.util.Scanner.nextInt(Scanner.java:2119) at org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379) at org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329) . So it seems this needs to be tested with more fonts. But I will test next in with the default Linux type1 fonts. On 3/7/14, 11:23 AM, Robert wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer
RE: [VOTE] Applying the Type 1 subset patch
Hi All, Thanks for your votes and testing the code. From reading the feedback I don't think it would be the right option to simply modify and push it through as a disabled by default feature and so will register Vincent's vote as a -1 and look to address his and Luis's concerns. Regarding one of the points Vincent made about the Postscript Parser, the matter is complicated by the nature of the code being parsed. A traditional method of parsing a file would be to scan for tokens (using maybe a String Tokenizer) and then send those to the interpreter. Unfortunately Postscript Type 1 fonts have a mixture of regular code and binary data (Subroutines / CharString data). If a traditional Tokenizer were to be used the data would inevitably become corrupted. The alternative I chose balances the need to keep these sections intact and accessible whilst providing the means to parse tokens and interpret them as part of an expandable solution. There may be other solutions but any parser which would be written would need to do so on a byte by byte basis as opposed to feeding it in and expecting a list of tokens. I am going to leave the current implementation as it is but will look to address the Bakoma font problem Luis found and perform more extensive testing with other Type 1 fonts to try and prevent any further issues. I will look to address the other issues you both raised in the coming weeks. Thanks for your input. Robert Meyer Date: Mon, 17 Mar 2014 00:19:18 + From: lmpmberna...@gmail.com To: fop-dev@xmlgraphics.apache.org Subject: Re: [VOTE] Applying the Type 1 subset patch I performed some further tests, still on Mac, but with a couple of ghostscript type1 fonts, which are probably the same one finds in Linux. The test was successful in that the output looked good (for the record I has some unpredictable output between different runs which I could not reliably reproduce so I attribute that to an environment issue, maybe the .fop directory). My example included characters not present in the font. Instead of # for the missing glyph I got z (see example attached), which probably is not intended (i.e., looks like a bug). I was also expecting that Adobe would indicate that the fonts are subset but it doesn't but this could be a wrong expectation (the subset file is nevertheless considerably smaller -- 64KB versus 219 KB). Finally I ran a simple performance test. With the patched code (that produces subset) the time was 175 msecs. With the current trunk 83 msecs. So I think the suggestion that Vincent put forward to not make subset the default for type1 makes sense for now. I think this requires a new vote with a new patch. On 3/12/14, 12:06 AM, Luis Bernardo wrote: Since apparently Macs have no type1 fonts I had to look for some and I tried the first one from http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma (cmb10) which gave a problem: java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:907) at java.util.Scanner.next(Scanner.java:1530) at java.util.Scanner.nextInt(Scanner.java:2160) at java.util.Scanner.nextInt(Scanner.java:2119) at org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379) at org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329) . So it seems this needs to be tested with more fonts. But I will test next in with the default Linux type1 fonts. On 3/7/14, 11:23 AM, Robert wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer
Re: [VOTE] Applying the Type 1 subset patch
Hi Rob, +1 from me. Good work. Thanks, Chris On 07/03/2014 11:23, Robert wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer
Re: [VOTE] Applying the Type 1 subset patch
On 07/03/14 12:23, Robert wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer From the quick look I had at the patch, I must say that some things are sources of concern to me: • The PostScript parser seems to be mixing lexical analysis, syntax analysis and interpretation. This makes it hard to follow and I could not figure out the meanings of the conditions in the various ‘if’ statements inside the ‘parse’ method. Also, part of the parsing seems to be leaking into Type1SubsetFile. I’m concerned about the robustness of the thing. For example, there are unguarded calls to Integer.parseInt. How tolerant will that be to malformed font files? • It seems that Type1SubsetFile tries to infer the mapping of character codes to glyph names. That essentially re-does what the mapChar method has already done earlier, with probable mismatch between the outputs of the two methods. In Type1SubsetFile.readEncoding I see references to the WinAnsi encoding, which may have nothing to do at all with the font’s own encoding. I suspect this is the source of the exception thrown when running the FO I attached to the issue. • there is a lot of memory allocation. First, the font is entirely loaded in memory in Type1SubsetFile.createSubset, then again in PFBParser, plus data copied around when creating the subset. Surely some of this memory allocation can be avoided. Have you profiled the code? How much more slow is it compared to fully embedding the font? Due to the possible regressions and the potential impact on performance, I must vote -1 against enabling Type 1 subsetting by default. If Type 1 subsetting is left as an option that can be manually configured by the user, then I vote +0. Vincent
Re: [VOTE] Applying the Type 1 subset patch
Since apparently Macs have no type1 fonts I had to look for some and I tried the first one from http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma (cmb10) which gave a problem: java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:907) at java.util.Scanner.next(Scanner.java:1530) at java.util.Scanner.nextInt(Scanner.java:2160) at java.util.Scanner.nextInt(Scanner.java:2119) at org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379) at org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329) . So it seems this needs to be tested with more fonts. But I will test next in with the default Linux type1 fonts. On 3/7/14, 11:23 AM, Robert wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer
Re: [VOTE] Applying the Type 1 subset patch
+1 On Fri, Mar 7, 2014 at 4:23 AM, Robert rme...@hotmail.co.uk wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer
Re: [VOTE] Applying the Type 1 subset patch
+1 from me. Nice work, Robert! Clay On Mar 7, 2014, at 3:23 AM, Robert rme...@hotmail.co.uk wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Regards, Robert Meyer
Re: [VOTE] Applying the Type 1 subset patch
On Fri, Mar 7, 2014 at 4:23 AM, Robert rme...@hotmail.co.uk wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Just to remind me, what new (external) library dependencies does this entail? FontBox? Regards, Robert Meyer
RE: [VOTE] Applying the Type 1 subset patch
The (optional) fontbox library dependency was added for the OpenType font / subset support which is already in trunk. This patch for subsetting Type 1 fonts adds no new dependencies and does not use fontbox. From: gl...@skynav.com Date: Fri, 7 Mar 2014 10:23:18 -0700 Subject: Re: [VOTE] Applying the Type 1 subset patch To: fop-dev@xmlgraphics.apache.org On Fri, Mar 7, 2014 at 4:23 AM, Robert rme...@hotmail.co.uk wrote: Hi All, About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode=full) will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision. I am going to be away for the next week or so but will tally up the votes and post the result once I am back. Here is a link to the patch and issue: https://issues.apache.org/jira/browse/FOP-2354 Just to remind me, what new (external) library dependencies does this entail? FontBox? Regards, Robert Meyer