Re: [VOTE] Applying the Type 1 subset patch

2014-03-17 Thread Vincent Hennebert

On 17/03/14 01:19, Luis Bernardo wrote:


I performed some further tests, still on Mac, but with a couple of ghostscript
type1 fonts, which are probably the same one finds in Linux.

The test was successful in that the output looked good (for the record I has
some unpredictable output between different runs which I could not reliably
reproduce so I attribute that to an environment issue, maybe the .fop 
directory).

My example included characters not present in the font. Instead of # for the
missing glyph I got z (see example attached), which probably is not intended
(i.e., looks like a bug). I was also expecting that Adobe would indicate that
the fonts are subset but it doesn't but this could be a wrong expectation (the


this is probably because the font’s PostScript name doesn’t start with
a subset tag (6 uppercase letters followed by a +) like it should. Also,
it may be necessary to add a CharSet entry to the font descriptor.



subset file is nevertheless considerably smaller -- 64KB versus 219 KB).

Finally I ran a simple performance test. With the patched code (that produces
subset) the time was 175 msecs. With the current trunk 83 msecs.

So I think the suggestion that Vincent put forward to not make subset the
default for type1 makes sense for now. I think this requires a new vote with a
new patch.


Vincent



On 3/12/14, 12:06 AM, Luis Bernardo wrote:


Since apparently Macs have no type1 fonts I had to look for some and I tried
the first one from http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma
(cmb10) which gave a problem:

java.util.NoSuchElementException
at java.util.Scanner.throwFor(Scanner.java:907)
at java.util.Scanner.next(Scanner.java:1530)
at java.util.Scanner.nextInt(Scanner.java:2160)
at java.util.Scanner.nextInt(Scanner.java:2119)
at
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)

at
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)

.

So it seems this needs to be tested with more fonts. But I will test next in
with the default Linux type1 fonts.

On 3/7/14, 11:23 AM, Robert wrote:

Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. All
referenced Type 1 fonts (unless set to embedding-mode=full) will now be
subset by default much like the behaviour exhibited by TrueType and
OpenType. As this is a big feature and quite involved I think it is
necessary to vote on whether to add this feature in it's current state to
FOP. I'm not sure if anyone has taken a look at what has gone into this or
tried it out yet, but it might be worth doing so before making your decision.

I am going to be away for the next week or so but will tally up the votes
and post the result once I am back.

Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354

Regards,

Robert Meyer







RE: [VOTE] Applying the Type 1 subset patch

2014-03-17 Thread Robert
Hi All,

Thanks for your votes and testing the code. From reading the feedback I don't 
think it would be the right option to simply modify and push it through as a 
disabled by default feature and so will register Vincent's vote as a -1 and 
look to address his and Luis's concerns.

Regarding one of the points Vincent made about the Postscript Parser, the 
matter is complicated by the nature of the code being parsed. A traditional 
method of parsing a file would be to scan for tokens (using maybe a String 
Tokenizer) and then send those to the interpreter. Unfortunately Postscript 
Type 1 fonts have a mixture of regular code and binary data (Subroutines / 
CharString data). If a traditional Tokenizer were to be used the data would 
inevitably become corrupted. The alternative I chose balances the need to keep 
these sections intact and accessible whilst providing the means to parse tokens 
and interpret them as part of an expandable solution. There may be other 
solutions but any parser which would be written would need to do so on a byte 
by byte basis as opposed to feeding it in and expecting a list of tokens. I am 
going to leave the current implementation as it is but will look to address the 
Bakoma font problem Luis found and perform more extensive testing with other 
Type 1 fonts to try and prevent any further issues.

I will look to address the other issues you both raised in the coming weeks.

Thanks for your input.

Robert Meyer

Date: Mon, 17 Mar 2014 00:19:18 +
From: lmpmberna...@gmail.com
To: fop-dev@xmlgraphics.apache.org
Subject: Re: [VOTE] Applying the Type 1 subset patch


  

  
  


  I performed some further tests, still on Mac, but with a couple of
  ghostscript type1 fonts, which are probably the same one finds in
  Linux.

  

  The test was successful in that the output looked good (for the
  record I has some unpredictable output between different runs
  which I could not reliably reproduce so I attribute that to an
  environment issue, maybe the .fop directory).

  

  My example included characters not present in the font. Instead of
  # for the missing glyph I got z (see example attached), which
  probably is not intended (i.e., looks like a bug). I was also
  expecting that Adobe would indicate that the fonts are subset but
  it doesn't but this could be a wrong expectation (the subset file
  is nevertheless considerably smaller -- 64KB versus 219 KB).

  

  Finally I ran a simple performance test. With the patched code
  (that produces subset) the time was 175 msecs. With the current
  trunk 83 msecs.

  

  So I think the suggestion that Vincent put forward to not make
  subset the default for type1 makes sense for now. I think this
  requires a new vote with a new patch.

  

  On 3/12/14, 12:06 AM, Luis Bernardo wrote:



  
  

Since apparently Macs have no type1 fonts I had to look for some
and I tried the first one from 
http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma
(cmb10) which gave a problem:



java.util.NoSuchElementException

at java.util.Scanner.throwFor(Scanner.java:907)

at java.util.Scanner.next(Scanner.java:1530)

at java.util.Scanner.nextInt(Scanner.java:2160)

at java.util.Scanner.nextInt(Scanner.java:2119)

at
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)

at
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)

.



So it seems this needs to be tested with more fonts. But I will
test next in with the default Linux type1 fonts.



On 3/7/14, 11:23 AM, Robert wrote:

  
  

Hi All,

  

  About a week ago I posted a patch to add Type 1 subset support
  to FOP. All referenced Type 1 fonts (unless set to
  embedding-mode=full) will now be subset by default much like
  the behaviour exhibited by TrueType and OpenType. As this is a
  big feature and quite involved I think it is necessary to vote
  on whether to add this feature in it's current state to FOP.
  I'm not sure if anyone has taken a look at what has gone into
  this or tried it out yet, but it might be worth doing so
  before making your decision.

  

  I am going to be away for the next week or so but will tally
  up the votes and post the result once I am back.

  

  Here is a link to the patch and issue:

  https://issues.apache.org/jira/browse/FOP-2354

  

  Regards,

  

  Robert Meyer


  
  



  

Re: [VOTE] Applying the Type 1 subset patch

2014-03-11 Thread Chris Bowditch

Hi Rob,

+1 from me. Good work.

Thanks,

Chris

On 07/03/2014 11:23, Robert wrote:

Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. 
All referenced Type 1 fonts (unless set to embedding-mode=full) will 
now be subset by default much like the behaviour exhibited by TrueType 
and OpenType. As this is a big feature and quite involved I think it 
is necessary to vote on whether to add this feature in it's current 
state to FOP. I'm not sure if anyone has taken a look at what has gone 
into this or tried it out yet, but it might be worth doing so before 
making your decision.


I am going to be away for the next week or so but will tally up the 
votes and post the result once I am back.


Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354

Regards,

Robert Meyer




Re: [VOTE] Applying the Type 1 subset patch

2014-03-11 Thread Vincent Hennebert

On 07/03/14 12:23, Robert wrote:

Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced 
Type 1 fonts (unless set to embedding-mode=full) will now be subset by 
default much like the behaviour exhibited by TrueType and OpenType. As this is a big 
feature and quite involved I think it is necessary to vote on whether to add this feature 
in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone 
into this or tried it out yet, but it might be worth doing so before making your decision.

I am going to be away for the next week or so but will tally up the votes and 
post the result once I am back.

Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354

Regards,

Robert Meyer


From the quick look I had at the patch, I must say that some things are
sources of concern to me:
• The PostScript parser seems to be mixing lexical analysis, syntax
  analysis and interpretation. This makes it hard to follow and I could
  not figure out the meanings of the conditions in the various ‘if’
  statements inside the ‘parse’ method. Also, part of the parsing seems
  to be leaking into Type1SubsetFile. I’m concerned about the robustness
  of the thing. For example, there are unguarded calls to
  Integer.parseInt. How tolerant will that be to malformed font files?
• It seems that Type1SubsetFile tries to infer the mapping of character
  codes to glyph names. That essentially re-does what the mapChar method
  has already done earlier, with probable mismatch between the outputs
  of the two methods. In Type1SubsetFile.readEncoding I see references
  to the WinAnsi encoding, which may have nothing to do at all with the
  font’s own encoding. I suspect this is the source of the exception
  thrown when running the FO I attached to the issue.
• there is a lot of memory allocation. First, the font is entirely
  loaded in memory in Type1SubsetFile.createSubset, then again in
  PFBParser, plus data copied around when creating the subset. Surely
  some of this memory allocation can be avoided. Have you profiled the
  code? How much more slow is it compared to fully embedding the font?

Due to the possible regressions and the potential impact on performance,
I must vote -1 against enabling Type 1 subsetting by default. If Type 1
subsetting is left as an option that can be manually configured by the
user, then I vote +0.


Vincent


Re: [VOTE] Applying the Type 1 subset patch

2014-03-11 Thread Luis Bernardo


Since apparently Macs have no type1 fonts I had to look for some and I 
tried the first one from 
http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma (cmb10) which 
gave a problem:


java.util.NoSuchElementException
at java.util.Scanner.throwFor(Scanner.java:907)
at java.util.Scanner.next(Scanner.java:1530)
at java.util.Scanner.nextInt(Scanner.java:2160)
at java.util.Scanner.nextInt(Scanner.java:2119)
at 
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)
at 
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)

.

So it seems this needs to be tested with more fonts. But I will test 
next in with the default Linux type1 fonts.


On 3/7/14, 11:23 AM, Robert wrote:

Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. 
All referenced Type 1 fonts (unless set to embedding-mode=full) will 
now be subset by default much like the behaviour exhibited by TrueType 
and OpenType. As this is a big feature and quite involved I think it 
is necessary to vote on whether to add this feature in it's current 
state to FOP. I'm not sure if anyone has taken a look at what has gone 
into this or tried it out yet, but it might be worth doing so before 
making your decision.


I am going to be away for the next week or so but will tally up the 
votes and post the result once I am back.


Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354

Regards,

Robert Meyer




Re: [VOTE] Applying the Type 1 subset patch

2014-03-10 Thread Glenn Adams
+1


On Fri, Mar 7, 2014 at 4:23 AM, Robert rme...@hotmail.co.uk wrote:

 Hi All,

 About a week ago I posted a patch to add Type 1 subset support to FOP. All
 referenced Type 1 fonts (unless set to embedding-mode=full) will now be
 subset by default much like the behaviour exhibited by TrueType and
 OpenType. As this is a big feature and quite involved I think it is
 necessary to vote on whether to add this feature in it's current state to
 FOP. I'm not sure if anyone has taken a look at what has gone into this or
 tried it out yet, but it might be worth doing so before making your
 decision.

 I am going to be away for the next week or so but will tally up the votes
 and post the result once I am back.

 Here is a link to the patch and issue:
 https://issues.apache.org/jira/browse/FOP-2354

 Regards,

 Robert Meyer



Re: [VOTE] Applying the Type 1 subset patch

2014-03-10 Thread Clay Leeds
+1 from me. Nice work, Robert!

Clay

On Mar 7, 2014, at 3:23 AM, Robert rme...@hotmail.co.uk wrote:

 Hi All,
 
 About a week ago I posted a patch to add Type 1 subset support to FOP. All 
 referenced Type 1 fonts (unless set to embedding-mode=full) will now be 
 subset by default much like the behaviour exhibited by TrueType and OpenType. 
 As this is a big feature and quite involved I think it is necessary to vote 
 on whether to add this feature in it's current state to FOP. I'm not sure if 
 anyone has taken a look at what has gone into this or tried it out yet, but 
 it might be worth doing so before making your decision.
 
 I am going to be away for the next week or so but will tally up the votes and 
 post the result once I am back.
 
 Here is a link to the patch and issue:
 https://issues.apache.org/jira/browse/FOP-2354
 
 Regards,
 
 Robert Meyer



Re: [VOTE] Applying the Type 1 subset patch

2014-03-07 Thread Glenn Adams
On Fri, Mar 7, 2014 at 4:23 AM, Robert rme...@hotmail.co.uk wrote:

 Hi All,

 About a week ago I posted a patch to add Type 1 subset support to FOP. All
 referenced Type 1 fonts (unless set to embedding-mode=full) will now be
 subset by default much like the behaviour exhibited by TrueType and
 OpenType. As this is a big feature and quite involved I think it is
 necessary to vote on whether to add this feature in it's current state to
 FOP. I'm not sure if anyone has taken a look at what has gone into this or
 tried it out yet, but it might be worth doing so before making your
 decision.

 I am going to be away for the next week or so but will tally up the votes
 and post the result once I am back.

 Here is a link to the patch and issue:
 https://issues.apache.org/jira/browse/FOP-2354


Just to remind me, what new (external) library dependencies does this
entail? FontBox?




 Regards,

 Robert Meyer



RE: [VOTE] Applying the Type 1 subset patch

2014-03-07 Thread Robert
The (optional) fontbox library dependency was added for the OpenType font / 
subset support which is already in trunk. This patch for subsetting Type 1 
fonts adds no new dependencies and does not use fontbox.
 
From: gl...@skynav.com
Date: Fri, 7 Mar 2014 10:23:18 -0700
Subject: Re: [VOTE] Applying the Type 1 subset patch
To: fop-dev@xmlgraphics.apache.org


On Fri, Mar 7, 2014 at 4:23 AM, Robert rme...@hotmail.co.uk wrote:





Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. All 
referenced Type 1 fonts (unless set to embedding-mode=full) will now be 
subset by default much like the behaviour exhibited by TrueType and OpenType. 
As this is a big feature and quite involved I think it is necessary to vote on 
whether to add this feature in it's current state to FOP. I'm not sure if 
anyone has taken a look at what has gone into this or tried it out yet, but it 
might be worth doing so before making your decision.



I am going to be away for the next week or so but will tally up the votes and 
post the result once I am back.

Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354


Just to remind me, what new (external) library dependencies does this entail? 
FontBox? 



Regards,

Robert Meyer