Re: Fwd: fop-pdf-image and fonts; as requested
Hi Craig, Excellent!!! I think we're making some progress here! snip Ugh. A well-designed RIP should be able to load XObject forms on demand and free them under memory pressure. After all, an image is also a global resource that can be referenced multiple times across different pages (an indirect object with a stream), but PDFs with large numbers of images don't typically crash RIPs. There's no excuse for lots of small indirect objects crashing a RIP, be they images or form xobjects. The operative word there is well-designed, but also, I think you're making a lot of assumptions about how the RIP handles these object. I don't disagree with your assumptions, but I'm just saying, you don't know how the RIP handles these objects so you have to be careful. snip/ The same is technically true of rendering a form XObject. Once you've drawn it, you can discard its content stream from memory and discard any resources you loaded from its resources dictionary. The trouble is that you don't know if you'll just be loading it again for the next page. It'd be fairly simple to keep a LRU list of form XObjects so they get unloaded if they're not referenced after a few pages are processed and there's memory pressure. I won't be too surprised if most RIPs don't do this, though. Yeah, again, assuming the people who designed the code designed it to be robust and flexible is a dangerous assumption I think. snip If you want to use PDFs as image-like resources within a page (as I do) then you can't just append the /Page object from the source PDF. As I understand it (I haven't implemented this) it's necessary to: * Extract the /Page's content stream(s) plus all resources referenced * Append the referenced resource(s) to the target page's resource dictionary, allocating new object numbers as you copy a resource and changing the target of any indirect references to match the new object number * Insert the concatenated content streams from the source PDF into the output content stream. They must be surrounded by appropriate graphics state save and restore operators and any necessary scale/position operations to place the content where you want it. HA HA!! Incorrect! If you look into the nooks and crannies of the PDF spec, you'll see that it's possible to use content stream arrays for the /Page content stream. I'll leave exploring that to you, but basically it makes overlaying pages much much simpler. In related news, PDFBox does just that!! What we did (and it's super hack, but it worked) is if there we pages with both PDF-image content and FOP generated content, we'd get FOP to generate the content without the PDF-image and just overlay the pages. Best of both worlds!! (Though the purist in me is very much aggrieved) Ok, so maybe I'll add some transparency as to how we came to some of these decisions. The client told us that PDFs ~16k pages with with 6-8k XObjects (I *heart* grep) were disproportionally slow and that fonts were to blame, so obviously that's where we started. I managed to do some font de-duping of Type1 fonts (seen as FOP doesn't subset these), it was horrendous, the fidelity was terrible but I was just experimenting. This made some impact, but not enough. So after some more experimentation, proving fonts weren't to blame, we had to step back and look at the problem again. We also, found out that the RIP times didn't correlate to the size of the document i.e. x pages takes y time, 2x was taking 10y time (if that makes sense). This made us think it was a memory issue, some how the RIPs memory was filling up. A lot of faffing about later, and we got to the conclusions I've described. The more you describe your problem, the more it sounds like you need to do exactly what we did, but just to be sure, I thought I'd explain how we got there. Assumptions are a dangerous thing and I've probably made some about your issue too. Hopefully we can get to some resolution about this soon, Mehdi
Re: Fwd: fop-pdf-image and fonts; as requested
On 07/03/12 16:35, mehdi houshmand wrote: * Insert the concatenated content streams from the source PDF into the output content stream. They must be surrounded by appropriate graphics state save and restore operators and any necessary scale/position operations to place the content where you want it. HA HA!! Incorrect! If you look into the nooks and crannies of the PDF spec, you'll see that it's possible to use content stream arrays for the /Page content stream. Sure - that's why I said the content stream(s) had to be concatenated before insertion, because the input might be an array of content streams. I was thinking that to get reliable results when overlaying you'd have to wrap the whole series of drawing operations from the input in state saving/restoring operations, etc, thus having to concatenate the streams before wrapping. In retrospect, that's not true; one can just as well wrap each copied content stream in state save/restore and scale/position operations. It might even be possible to get away without a graphics state save/restore, but I don't think so. IIRC multiple content streams are treated by the reader as if they were one concatenated stream, so you still have to save/restore gstate to ensure the inserted stream doesn't mess up anything after it. I'll have to check this in the PDF ref, though. I'll leave exploring that to you, but basically it makes overlaying pages much much simpler. In related news, PDFBox does just that!! What we did (and it's super hack, but it worked) is if there we pages with both PDF-image content and FOP generated content, we'd get FOP to generate the content without the PDF-image and just overlay the pages. Best of both worlds!! (Though the purist in me is very much aggrieved) Urk, that's horrible! Effective, though, I expect. Presumably you still have to translate scale and rotate then clip the content stream you're overlaying, though. [snip] The more you describe your problem, the more it sounds like you need to do exactly what we did, but just to be sure, I thought I'd explain how we got there. Assumptions are a dangerous thing and I've probably made some about your issue too. Given what you've described I'm inclined to agree that the cause of the issues is the same. I suspect we're facing the same problem or very similar problems, in which case my RIP crash issues may not be font related after all. I still want to fix the font issues because, rip crash causing or not, the font subset duplication produces massively bloated PDFs that are totally unsuitable for online distribution. It's kind of disheartening to learn that the RIP crash issues are probably something else entirely, since I thought I at least had to solve only one problem. As for doing exactly what you did: I'd certainly be very interested in seeing your PDFBox code for loading the fop-generated PDF, finding the placeholders, and overlaying the PDF graphics over them. In particular I'd like to see how you handled scaling/translation/rotation/clipping when drawing the copied streams, and how you handled state saving and restoration. I can see overlaying over placeholders in post-processing as a really useful interim solution, though eventually I'd like to enhance fop-pdf-image to do that overlaying directly. The really frustrating thing is that sometimes using an XObject will be exactly the right thing to do, because the PDF being embedded actually appears multiple times in the document. The solution to this links neatly into the font de-duplication issue: fop image plugins need a way to store per-render-run information, in this case so they can determine how often an image occurs in a document during the preload run and make an appropriate decision about how to embed it. I'm not sure it's even necessary to have an image plugin api change for this; plugins should be able to store enough information in a WeakHashMapFOUserAgent,... to figure it out, so I should be able to make fop-image-plugin use form XObjects only for pdf images referenced multiple times. -- Craig Ringer
DO NOT REPLY [Bug 52849] [PATCH] SVG font being painted as shapes when font present in the system
https://issues.apache.org/bugzilla/show_bug.cgi?id=52849 Luis Bernardo lmpmberna...@gmail.com changed: What|Removed |Added Summary|SVG font being painted as |[PATCH] SVG font being |shapes when font present in |painted as shapes when font |the system |present in the system --- Comment #2 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 14:56:11 UTC --- title updated. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
DO NOT REPLY [Bug 52849] [PATCH] SVG font being painted as shapes when font present in the system
https://issues.apache.org/bugzilla/show_bug.cgi?id=52849 --- Comment #3 from Mehdi Houshmand med1...@gmail.com 2012-03-07 15:07:51 UTC --- Hi Luis, Just thinking about this, I think maybe the FontInfo class shouldn't really be responsible for remembering that which notifications it has sent. It just just notify the FontEventListener, then the FontEventListener should decide whether to notify the user or not. Just a thought, maybe I'm missing something... I see that there's a loggedFontKeys that does something similar, maybe there's a reason for that? If anyone else has thoughts, please do chime in. Mehdi -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
DO NOT REPLY [Bug 51977] a created long pdf document is not readable
https://issues.apache.org/bugzilla/show_bug.cgi?id=51977 --- Comment #3 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 16:07:49 UTC --- what viewer are you using? I can open the file with evince and see the contents without problem (it does take a bit to load though). -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: Fwd: fop-pdf-image and fonts; as requested
Just my 2 cents on a particular detail... On 07/03/12 07:51, Craig Ringer wrote: On 06/03/12 18:49, mehdi houshmand wrote: snip/ So with that in mind, what exactly are you trying to do? Why are you using FOP to merge PDFs? I'm using FOP to produce documents containing a mixture of automatically typeset formatted text and graphics. Many of the graphics are PDF documents, and need to be PDF documents because they contain vector artwork and text that would lose quality and grow massively in size if embedded in rasterised form. Is SVG an option for you? That might save you a lot of trouble. Or if not readily available, that might still be less work. snip/ Vincent
DO NOT REPLY [Bug 52849] [PATCH] SVG font being painted as shapes when font present in the system
https://issues.apache.org/bugzilla/show_bug.cgi?id=52849 --- Comment #4 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 23:14:22 UTC --- I confess I did not put a lot of thought into that and instead used the same pattern that was already present in the that class in the notifyFontReplacement() method. But the remark is a good one. In a more general event framework the listener could even indicate at the time of registration of interest in a particular class of events whether to receive repeated events or just unique events. In the current event framework, yes, I think that letting the listener decide whether to notify or not the user makes sense. However, the problem I see is with the logging. If there is no listener the code falls back to log.warn() calls and I think we do not want repeated messages to clutter the logs. Maybe that is the reason it was done like this originally? -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
DO NOT REPLY [Bug 51977] a created long pdf document is not readable
https://issues.apache.org/bugzilla/show_bug.cgi?id=51977 --- Comment #4 from Luis Bernardo lmpmberna...@gmail.com 2012-03-07 23:24:59 UTC --- this is odd... the PDF appears as a white sheet in Adobe Reader (Mac 10.1.2), but the content is visible in Preview (Mac OS X viewer) and in Evince (Linux) -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: Fwd: fop-pdf-image and fonts; as requested
On 08/03/12 04:12, Vincent Hennebert wrote: Just my 2 cents on a particular detail... On 07/03/12 07:51, Craig Ringer wrote: On 06/03/12 18:49, mehdi houshmand wrote: snip/ So with that in mind, what exactly are you trying to do? Why are you using FOP to merge PDFs? I'm using FOP to produce documents containing a mixture of automatically typeset formatted text and graphics. Many of the graphics are PDF documents, and need to be PDF documents because they contain vector artwork and text that would lose quality and grow massively in size if embedded in rasterised form. Is SVG an option for you? That might save you a lot of trouble. Or if not readily available, that might still be less work. Alas, SVG isn't an option. We have a large body of work already in PDF (and EPS) format that we can't easily convert to SVG. Until I checked just now I didn't know that SVG even supported embedded fonts. Does fop actually support that and include embedded SVG fonts in output PDF? -- Craig Ringer
Re: Fwd: fop-pdf-image and fonts; as requested
Haha, well the shortest answer I can give is kinda. SVG uses Batik, which in turn uses the AWT font classes. Long story short, you have to install the font on the system as well as having it in the fop.xconf. There are plenty of discussions on this on the mailing lists for you to peruse at your leisure. Mehdi On 8 March 2012 02:17, Craig Ringer ring...@ringerc.id.au wrote: On 08/03/12 04:12, Vincent Hennebert wrote: Just my 2 cents on a particular detail... On 07/03/12 07:51, Craig Ringer wrote: On 06/03/12 18:49, mehdi houshmand wrote: snip/ So with that in mind, what exactly are you trying to do? Why are you using FOP to merge PDFs? I'm using FOP to produce documents containing a mixture of automatically typeset formatted text and graphics. Many of the graphics are PDF documents, and need to be PDF documents because they contain vector artwork and text that would lose quality and grow massively in size if embedded in rasterised form. Is SVG an option for you? That might save you a lot of trouble. Or if not readily available, that might still be less work. Alas, SVG isn't an option. We have a large body of work already in PDF (and EPS) format that we can't easily convert to SVG. Until I checked just now I didn't know that SVG even supported embedded fonts. Does fop actually support that and include embedded SVG fonts in output PDF? -- Craig Ringer