Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread mehdi houshmand
Hi Craig,

Excellent!!! I think we're making some progress here!

snip
 Ugh. A well-designed RIP should be able to load XObject forms on demand
 and free them under memory pressure. After all, an image is also a
 global resource that can be referenced multiple times across different
 pages (an indirect object with a stream), but PDFs with large numbers of
 images don't typically crash RIPs. There's no excuse for lots of small
 indirect objects crashing a RIP, be they images or form xobjects.

The operative word there is well-designed, but also, I think you're
making a lot of assumptions about how the RIP handles these object. I
don't disagree with your assumptions, but I'm just saying, you don't
know how the RIP handles these objects so you have to be careful.

snip/
 The same is technically true of rendering a form XObject. Once you've
 drawn it, you can discard its content stream from memory and discard any
 resources you loaded from its resources dictionary. The trouble is that
 you don't know if you'll just be loading it again for the next page.
 It'd be fairly simple to keep a LRU list of form XObjects so they get
 unloaded if they're not referenced after a few pages are processed and
 there's memory pressure. I won't be too surprised if most RIPs don't do
 this, though.

Yeah, again, assuming the people who designed the code designed it to
be robust and flexible is a dangerous assumption I think.

 snip
 If you want to use PDFs as image-like resources within a page (as I do)
 then you can't just append the /Page object from the source PDF. As I
 understand it (I haven't implemented this) it's necessary to:

 * Extract the /Page's content stream(s) plus all resources referenced
 * Append the referenced resource(s) to the target page's resource
 dictionary, allocating new object numbers as you copy a resource and
 changing the target of any indirect references to match the new object
 number
 * Insert the concatenated content streams from the source PDF into the
 output content stream. They must be surrounded by appropriate graphics
 state save and restore operators and any necessary scale/position
 operations to place the content where you want it.

HA HA!! Incorrect! If you look into the nooks and crannies of the PDF
spec, you'll see that it's possible to use content stream arrays for
the /Page content stream. I'll leave exploring that to you, but
basically it makes overlaying pages much much simpler. In related
news, PDFBox does just that!! What we did (and it's super hack, but it
worked) is if there we pages with both PDF-image content and FOP
generated content, we'd get FOP to generate the content without the
PDF-image and just overlay the pages. Best of both worlds!! (Though
the purist in me is very much aggrieved)

Ok, so maybe I'll add some transparency as to how we came to some of
these decisions. The client told us that PDFs ~16k pages with with
6-8k XObjects (I *heart* grep) were disproportionally slow and that
fonts were to blame, so obviously that's where we started. I managed
to do some font de-duping of Type1 fonts (seen as FOP doesn't subset
these), it was horrendous, the fidelity was terrible but I was just
experimenting. This made some impact, but not enough. So after some
more experimentation, proving fonts weren't to blame, we had to step
back and look at the problem again. We also, found out that the RIP
times didn't correlate to the size of the document i.e. x pages takes
y time, 2x was taking 10y time (if that makes sense). This made us
think it was a memory issue, some how the RIPs memory was filling up.
A lot of faffing about later, and we got to the conclusions I've
described.

The more you describe your problem, the more it sounds like you need
to do exactly what we did, but just to be sure, I thought I'd explain
how we got there. Assumptions are a dangerous thing and I've probably
made some about your issue too.

Hopefully we can get to some resolution about this soon,

Mehdi


Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread Craig Ringer
On 07/03/12 16:35, mehdi houshmand wrote:
 * Insert the concatenated content streams from the source PDF into the
 output content stream. They must be surrounded by appropriate graphics
 state save and restore operators and any necessary scale/position
 operations to place the content where you want it.
 
 HA HA!! Incorrect! If you look into the nooks and crannies of the PDF
 spec, you'll see that it's possible to use content stream arrays for
 the /Page content stream.

Sure - that's why I said the content stream(s) had to be concatenated
before insertion, because the input might be an array of content streams.

I was thinking that to get reliable results when overlaying you'd have
to wrap the whole series of drawing operations from the input in state
saving/restoring operations, etc, thus having to concatenate the streams
before wrapping. In retrospect, that's not true; one can just as well
wrap each copied content stream in state save/restore and scale/position
operations.

It might even be possible to get away without a graphics state
save/restore, but I don't think so. IIRC multiple content streams are
treated by the reader as if they were one concatenated stream, so you
still have to save/restore gstate to ensure the inserted stream doesn't
mess up anything after it. I'll have to check this in the PDF ref, though.

 I'll leave exploring that to you, but
 basically it makes overlaying pages much much simpler. In related
 news, PDFBox does just that!! What we did (and it's super hack, but it
 worked) is if there we pages with both PDF-image content and FOP
 generated content, we'd get FOP to generate the content without the
 PDF-image and just overlay the pages. Best of both worlds!! (Though
 the purist in me is very much aggrieved)

Urk, that's horrible! Effective, though, I expect. Presumably you still
have to translate scale and rotate then clip the content stream you're
overlaying, though.

 [snip]

 The more you describe your problem, the more it sounds like you need
 to do exactly what we did, but just to be sure, I thought I'd explain
 how we got there. Assumptions are a dangerous thing and I've probably
 made some about your issue too.

Given what you've described I'm inclined to agree that the cause of the
issues is the same. I suspect we're facing the same problem or very
similar problems, in which case my RIP crash issues may not be font
related after all.

I still want to fix the font issues because, rip crash causing or not,
the font subset duplication produces massively bloated PDFs that are
totally unsuitable for online distribution. It's kind of disheartening
to learn that the RIP crash issues are probably something else entirely,
since I thought I at least had to solve only one problem.

As for doing exactly what you did: I'd certainly be very interested in
seeing your PDFBox code for loading the fop-generated PDF, finding the
placeholders, and overlaying the PDF graphics over them. In particular
I'd like to see how you handled scaling/translation/rotation/clipping
when drawing the copied streams, and how you handled state saving and
restoration.

I can see overlaying over placeholders in post-processing as a really
useful interim solution, though eventually I'd like to enhance
fop-pdf-image to do that overlaying directly.

The really frustrating thing is that sometimes using an XObject will be
exactly the right thing to do, because the PDF being embedded actually
appears multiple times in the document. The solution to this links
neatly into the font de-duplication issue: fop image plugins need a way
to store per-render-run information, in this case so they can determine
how often an image occurs in a document during the preload run and make
an appropriate decision about how to embed it. I'm not sure it's even
necessary to have an image plugin api change for this; plugins should be
able to store enough information in a WeakHashMapFOUserAgent,... to
figure it out, so I should be able to make fop-image-plugin use form
XObjects only for pdf images referenced multiple times.

--
Craig Ringer



DO NOT REPLY [Bug 52849] [PATCH] SVG font being painted as shapes when font present in the system

2012-03-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=52849

Luis Bernardo lmpmberna...@gmail.com changed:

   What|Removed |Added

Summary|SVG font being painted as   |[PATCH] SVG font being
   |shapes when font present in |painted as shapes when font
   |the system  |present in the system

--- Comment #2 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 14:56:11 
UTC ---
title updated.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 52849] [PATCH] SVG font being painted as shapes when font present in the system

2012-03-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=52849

--- Comment #3 from Mehdi Houshmand med1...@gmail.com 2012-03-07 15:07:51 UTC 
---
Hi Luis,

Just thinking about this, I think maybe the FontInfo class shouldn't really be
responsible for remembering that which notifications it has sent. It just just
notify the FontEventListener, then the FontEventListener should decide whether
to notify the user or not.

Just a thought, maybe I'm missing something... I see that there's a
loggedFontKeys that does something similar, maybe there's a reason for that? If
anyone else has thoughts, please do chime in.

Mehdi

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51977] a created long pdf document is not readable

2012-03-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51977

--- Comment #3 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 16:07:49 
UTC ---
what viewer are you using? I can open the file with evince and see the contents
without problem (it does take a bit to load though).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread Vincent Hennebert
Just my 2 cents on a particular detail...

On 07/03/12 07:51, Craig Ringer wrote:
 On 06/03/12 18:49, mehdi houshmand wrote:
snip/
 So with that in mind, what exactly are you trying to do? Why are you
 using FOP to merge PDFs?
 I'm using FOP to produce documents containing a mixture of automatically
 typeset formatted text and graphics. Many of the graphics are PDF
 documents, and need to be PDF documents because they contain vector
 artwork and text that would lose quality and grow massively in size if
 embedded in rasterised form.

Is SVG an option for you? That might save you a lot of trouble. Or if
not readily available, that might still be less work.

snip/

Vincent


DO NOT REPLY [Bug 52849] [PATCH] SVG font being painted as shapes when font present in the system

2012-03-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=52849

--- Comment #4 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 23:14:22 
UTC ---
I confess I did not put a lot of thought into that and instead used the same
pattern that was already present in the that class in the
notifyFontReplacement() method. But the remark is a good one. In a more general
event framework the listener could even indicate at the time of registration of
interest in a particular class of events whether to receive repeated events or
just unique events. In the current event framework, yes, I think that letting
the listener decide whether to notify or not the user makes sense. However, the
problem I see is with the logging. If there is no listener the code falls back
to log.warn() calls and I think we do not want repeated messages to clutter the
logs. Maybe that is the reason it was done like this originally?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51977] a created long pdf document is not readable

2012-03-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51977

--- Comment #4 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 23:24:59 
UTC ---
this is odd... the PDF appears as a white sheet in Adobe Reader (Mac 10.1.2),
but the content is visible in Preview (Mac OS X viewer) and in Evince (Linux)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread Craig Ringer
On 08/03/12 04:12, Vincent Hennebert wrote:
 Just my 2 cents on a particular detail...
 
 On 07/03/12 07:51, Craig Ringer wrote:
 On 06/03/12 18:49, mehdi houshmand wrote:
 snip/
 So with that in mind, what exactly are you trying to do? Why are you
 using FOP to merge PDFs?
 I'm using FOP to produce documents containing a mixture of automatically
 typeset formatted text and graphics. Many of the graphics are PDF
 documents, and need to be PDF documents because they contain vector
 artwork and text that would lose quality and grow massively in size if
 embedded in rasterised form.
 
 Is SVG an option for you? That might save you a lot of trouble. Or if
 not readily available, that might still be less work.

Alas, SVG isn't an option. We have a large body of work already in PDF
(and EPS) format that we can't easily convert to SVG.

Until I checked just now I didn't know that SVG even supported embedded
fonts. Does fop actually support that and include embedded SVG fonts in
output PDF?

--
Craig Ringer



Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread mehdi houshmand
Haha, well the shortest answer I can give is kinda.

SVG uses Batik, which in turn uses the AWT font classes. Long story
short, you have to install the font on the system as well as having it
in the fop.xconf. There are plenty of discussions on this on the
mailing lists for you to peruse at your leisure.

Mehdi


On 8 March 2012 02:17, Craig Ringer ring...@ringerc.id.au wrote:
 On 08/03/12 04:12, Vincent Hennebert wrote:
 Just my 2 cents on a particular detail...

 On 07/03/12 07:51, Craig Ringer wrote:
 On 06/03/12 18:49, mehdi houshmand wrote:
 snip/
 So with that in mind, what exactly are you trying to do? Why are you
 using FOP to merge PDFs?
 I'm using FOP to produce documents containing a mixture of automatically
 typeset formatted text and graphics. Many of the graphics are PDF
 documents, and need to be PDF documents because they contain vector
 artwork and text that would lose quality and grow massively in size if
 embedded in rasterised form.

 Is SVG an option for you? That might save you a lot of trouble. Or if
 not readily available, that might still be less work.

 Alas, SVG isn't an option. We have a large body of work already in PDF
 (and EPS) format that we can't easily convert to SVG.

 Until I checked just now I didn't know that SVG even supported embedded
 fonts. Does fop actually support that and include embedded SVG fonts in
 output PDF?

 --
 Craig Ringer