Re: PDFName.getName() returns escaped name?!
On 04/04/12 18:02, Vincent Hennebert wrote: Hi Craig, Thanks for your extensive study! On 03/04/12 10:31, Craig Ringer wrote: On 03/04/12 01:16, Vincent Hennebert wrote: From a quick look that sounds about right. Are you developing with a 1.7 JDK? You would have to make your code 1.5-compatible. Also, it would be good if you could back your optimizations with profiling data. If code safety/readability have to be compromised there has to be a good reason for it. Yep, I'm on 1.7, but was building for a 1.5 target and source level. Careful, because that doesn’t ensure you that your code will run on a 1.5 JVM. You may still be using elements of the standard library that appeared only in 1.6+. Thanks - I forgot about that. Sigh. The glacial pace at which people update JREs and JDKs drives me quite nuts sometimes - but at least fop has moved to 1.5! Time to do some JDK archaeology... -- Craig Ringer
Re: PDFName.getName() returns escaped name?!
On 03/04/12 01:16, Vincent Hennebert wrote: From a quick look that sounds about right. Are you developing with a 1.7 JDK? You would have to make your code 1.5-compatible. Also, it would be good if you could back your optimizations with profiling data. If code safety/readability have to be compromised there has to be a good reason for it. Yep, I'm on 1.7, but was building for a 1.5 target and source level. The posted PDFName is really an example of what I'm thinking of; it's not something that can be seriously merged without a *lot* of changes across the rest of the tree to convert String name handling to PDFName, to check for leading /s being added, to check for use of toString() in name output to PDF data, etc. It's more a starting point for discussion and to see what kind of interest there is in working on the pdf lib. It looks like folks here are pretty receptive to significantly reworking the PDF lib if there's someone willing to stump up the time. That might well be me if it turns out porting the whole renderer to PDFBox instead turns out to be lots harder than reworking the fop pdf lib. I suspect not. That said, please do feel free to give it a try if that route appeals to you. We would certainly consider a switch if it looks more promising in the long term. I'm wondering how practical it'd be to progressively adopt PDFBox, actually, rather than doing an abrupt and total switch. The PDF primitives in PDFBox (COSName, COSNumber, COSDictionary, etc) are modeled extremely similarlines as those in FOP's PDF library, and while they won't be drop-in replacements they behave very similarly, just with different method names and in some cases different datatype assumptions (PDFName instead of String dictionary keys for example). The only truly big difference looks to be in the handling of indirect objects, where FOP uses one class that may be direct or indirect, and PDFBox uses a dedicated `COSObject' class that wraps other objects for indirect objects. If it proves possible to do a largely 1:1 substitution of PDFBox primitive PDF classes for FOP ones that'd *greatly* simplify the job of moving to using the PDFBox `PD' model for document output, and would let the job be done in a couple of distinct and separate chunks. PDFFont, PDFXObject, etc would build on top of the COS types like COSDictionary, COSBase and COSObject, rather than on top of PDFObject, PDFDictionary, etc. It's not a perfect 1:1 mapping; notable imperfect matches where one or more classes map to one or more different classes include: FOP PDFBOX --- java.lang.String, COSName (for PDF names) FOP PDFText java.lang.String (for Unicode text) byte[] or a binbuf/binstr class (for binary data) --- PDFNumber COSNumber COSInteger COSFloat --- ??? (j.l.Boolean?) COSBoolean --- PDFObject COSBase (direct) COSObject (wraps a COSBase, indirect) PDFDocument COSDocument [+a new class with fop-only functionality] PDDocument Of those, the first already needs a cleanup as noted previously; fop uses String for at least two distinct and incompatible things (Unicode text and raw binary data) that must be separated anyway. The second doesn't look to be a big deal as COSNumber has factory methods to take care of it, it's pretty transparent. I'm not too worried about boolean handling either. The bigger ones are the different model for indirect objects, and the non-trivial work to move PDFDocument to extend/wrap COSDocument. I haven't looked into this in enough depth to have a meaningful assessment of how hard either of those would be yet. OTOH, some of the important ones map pretty directly, allowing for the differences in handling of indirect and direct objects higher up the inheritance tree: FOP PDFBOX -- PDFDictionary COSDictionary PDFArrayCOSArray PDFStream COSStream PDFNull COSNull If I get time (big if, at the moment) I'll see if I can have a play and determine what kind of work is involved in doing this. -- Craig Ringer
Re: Assigning unique resource names
It's all good. I haven't finished the important bit, integrating it into fop PDF images, yet. On Apr 3, 2012 2:25 AM, mehdi houshmand med1...@gmail.com wrote: Hi Craig, My sincerest apologies for not getting round to looking at what you've done here. I'll try and take a look in the next few days and give it a think, see if there's anything we can do to help. Apologies once again, Mehdi On 28 March 2012 07:39, Craig Ringer ring...@ringerc.id.au wrote: Hi I've nearly finished work on getting fop-pdf-image to overlay PDFs by appending their content streams and merging their resource dictionaries, rather than by creating XObject Forms. The problem I have left will be more intrusive into the fop codebase than what I've had to do so far, so I thought I'd check in before I start working on it. The reason I'm adapting fop-pdf-images to support merging PDF images into the main PDF content instead of using XObject Forms is that the use of lots of PDF XObject Forms seems to cause RIPs and clients to perform poorly or run out of memory. The way I propose to do it, fop-pdf-images will use an XObject form if the preloader sees a pdf image re-used more than a configurable number of times (one by default), and otherwise merge it into the main pdf. Most of that is done, but there's a problem with ensuring unique resource names. XObject Form resource dictionaries are their own namespace, so no resource name (font, ExtGState, etc) in an XObject Form may conflict with a name in the parent page's resource dictionary. If XObject Forms are no longer used by fop-pdf-image, that namespace separation goes away. I have to merge the image page(s)'s resource dictionaries into the resource dictionary of the page they're being overlaid over. In the case of fop, that's the global resource dictionary because fop doesn't currently write per-page resource dictionaries. There's nothing wrong with this beyond potentially making the resource dictionary a bit fat, but it means I need a way to guarantee that a name will not conflict with any other name assigned by fop. For GState dictionary objects that's easy; fop just uses GS+object number as the name, so if I follow the same scheme when copying resources I'm guaranteed to get a unique name since object numbers are unique. Unfortunately, fop doesn't do anything so consistent for fonts or most other resources, and that's made it nearly impossible for me to guarantee that I can use a name without a later part of the XSL-FO causing fop to create an object that tries to use the same name. Solving this will require some changes to the way fop writes the PDF resources dictionary. I propose that the PDFResources class should take responsibilty for allocating resource names and ensuring they're consistent. Instead of asking each resource what its name is, the PDFResources class should *assign* it a name. Those names can be minimal and compact - eg Fnn for fonts, GSnn for graphics states, etc. nn would be a counter maintained by PDFResources. That's the convention followed by most other PDF producing software and would make it simple and reliable to inject objects not created by fop into the resources dictionary without risk of conflicts. That'll be important if people want to be able to write extensions that add new, custom PDF content; it's not just useful for fop-pdf-images. This API change would only affect extensions, services and clients that work directly with org.apache.fop.pdf. and org.apache.fop.render.pdf. classes, and only some of those. Clients that use the main fop APIs would be completely unaffected, as would clients that use the area tree / IR code, image loader code, or pretty much anything except the guts of pdf handling. I'll post a proposed patch soon, along with patches for some other changes that enable what I'm doing but may be useful for others. A patch with the fop-pdf-images merge feature support will follow once I've finished it enough that I can do test-runs. -- Craig Ringer
Re: PDFName.getName() returns escaped name?!
On 03/30/2012 05:09 AM, J.Pietschmann wrote: Am 29.03.2012 01:24, schrieb Craig Ringer: I'd also like to have getEncodedName() return a byte[] not a String, since an encoded PDF name isn't actually text data. Sounds like a reasonable idea. BTW, is there any reason Fop's PDF library uses java.lang.String when working with sequences of PDF data bytes? I'd chalk this up to historical reasons, as usual. Fell free to provide a patch which cleans this up. J.Pietschmann Here's how I'd like to rewrite PDFName; untested code as an example of what I'm getting at. This is just a standalone file; a patch that incorporates it into the main sources will be a lot more work that I'm holding off on until I know folks here agree with the approach. In any case, after reading more of the PDF library I'm rethinking the wisdom of trying to make this change. The change its self is correct, but it'll be really hard to safely integrate into the rest of the PDF library because of the difficulty of auditing every site to ensure nothing breaks. Java likes to call `toString' automatically in places, meaning that anywhere that doesn't use the proper PDFWritable output methods PDFName inherits will break by producing bad PDF data that might be quite hard to spot. I'd start by making PDFName.toString() throw (for testing), but that'd only catch issues in code that test paths actually hit. Given the number of these kinds of issues in fop's pdf library I'm more and more inclined to wonder if it should just be replaced with PDFBox. It's *full* of text encoding issues, it crams 8-bit binary data into the lower 8 bits of Unicode strings, etc. Most of the classes that extend basics like PDFDictionary act like the base class isn't public API and break if anyone else changes the dictionary in ways they don't expect, too; they should have-a PDFDictionary not be-a PDFDictionary really. PDFBox is far from perfect, but it has a clean separation between the model classes (PD) and the basic PDF data types (COSxxx); it has a clean PDFName, PDFString, etc; it has a good PDF parser already, etc. Maybe it'd be easier for me to whip up a port of FOP's PDF output code to PDFBox? I suspect I'm insane to mention the possibility of doing that without evaluating the amount of work involved first, so I'm not promising anything, but by the looks it might be easier than doing the cleanups I'd like to do in fop. Thoughts? -- Craig Ringer /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ /* $Id$ */ package org.apache.fop.pdf; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.OutputStream; import java.io.Serializable; import java.nio.charset.CharsetEncoder; import java.nio.charset.CodingErrorAction; import java.util.*; import org.apache.commons.io.output.CountingOutputStream; /** * Class representing a PDF name object. * */ public class PDFName extends PDFObject { private static final MapByteString, PDFName commonNames; private final ByteString unescapedName; private ByteString escapedName; /** * Creates a new PDF name object from a Unicode java string, * encoding the name as UTF-8. * * @param name the name value */ public PDFName(String name) { super(); this.unescapedName = new ByteString(name.getBytes(java.nio.charset.StandardCharsets.UTF_8)); } /** * Creates a new PDF name object from a sequence of bytes * in no particular encoding. * * By PDF convention you should use utf-8 when encoding names * (as is done by the String-based PDFName constructor), but this * is NOT required by the spec. */ private PDFName(ByteString name) { super(); this.unescapedName = name; } /** * Create a PDFName with a pre-escaped name supplied. This is mostly useful * when defining names from data parsed from PDF data, or when allocating * pre-cached names. * * @param unescapedName Name with PDF name escapes decoded * @param escapedName Name encoded with PDF escapes */ private PDFName(ByteString unescapedName, ByteString escapedName) { this.unescapedName
Assigning unique resource names
Hi I've nearly finished work on getting fop-pdf-image to overlay PDFs by appending their content streams and merging their resource dictionaries, rather than by creating XObject Forms. The problem I have left will be more intrusive into the fop codebase than what I've had to do so far, so I thought I'd check in before I start working on it. The reason I'm adapting fop-pdf-images to support merging PDF images into the main PDF content instead of using XObject Forms is that the use of lots of PDF XObject Forms seems to cause RIPs and clients to perform poorly or run out of memory. The way I propose to do it, fop-pdf-images will use an XObject form if the preloader sees a pdf image re-used more than a configurable number of times (one by default), and otherwise merge it into the main pdf. Most of that is done, but there's a problem with ensuring unique resource names. XObject Form resource dictionaries are their own namespace, so no resource name (font, ExtGState, etc) in an XObject Form may conflict with a name in the parent page's resource dictionary. If XObject Forms are no longer used by fop-pdf-image, that namespace separation goes away. I have to merge the image page(s)'s resource dictionaries into the resource dictionary of the page they're being overlaid over. In the case of fop, that's the global resource dictionary because fop doesn't currently write per-page resource dictionaries. There's nothing wrong with this beyond potentially making the resource dictionary a bit fat, but it means I need a way to guarantee that a name will not conflict with any other name assigned by fop. For GState dictionary objects that's easy; fop just uses GS+object number as the name, so if I follow the same scheme when copying resources I'm guaranteed to get a unique name since object numbers are unique. Unfortunately, fop doesn't do anything so consistent for fonts or most other resources, and that's made it nearly impossible for me to guarantee that I can use a name without a later part of the XSL-FO causing fop to create an object that tries to use the same name. Solving this will require some changes to the way fop writes the PDF resources dictionary. I propose that the PDFResources class should take responsibilty for allocating resource names and ensuring they're consistent. Instead of asking each resource what its name is, the PDFResources class should *assign* it a name. Those names can be minimal and compact - eg Fnn for fonts, GSnn for graphics states, etc. nn would be a counter maintained by PDFResources. That's the convention followed by most other PDF producing software and would make it simple and reliable to inject objects not created by fop into the resources dictionary without risk of conflicts. That'll be important if people want to be able to write extensions that add new, custom PDF content; it's not just useful for fop-pdf-images. This API change would only affect extensions, services and clients that work directly with org.apache.fop.pdf. and org.apache.fop.render.pdf. classes, and only some of those. Clients that use the main fop APIs would be completely unaffected, as would clients that use the area tree / IR code, image loader code, or pretty much anything except the guts of pdf handling. I'll post a proposed patch soon, along with patches for some other changes that enable what I'm doing but may be useful for others. A patch with the fop-pdf-images merge feature support will follow once I've finished it enough that I can do test-runs. -- Craig Ringer
Re: Merging Temp_PDF_ObjectStreams branch to trunk
On 03/22/2012 03:54 PM, Pascal Sancho wrote: Hi, +0 I see no objection if PDF-image plugin is updated in a short delay (I mean: before next release). This might be a good opportunity to merge fop-pdf-image . -- Craig Ringer
Re: Fwd: fop-pdf-image and fonts; as requested
On 03/08/2012 01:25 PM, mehdi houshmand wrote: Haha, well the shortest answer I can give is kinda. SVG uses Batik, which in turn uses the AWT font classes. Long story short, you have to install the font on the system as well as having it in the fop.xconf. There are plenty of discussions on this on the mailing lists for you to peruse at your leisure. Thanks. For my purposes, that's a no, since I need to support any embedded font whether or not I have access to a complete copy. BTW, I've modified PDFBox's Overlay.java application to support translation, scaling and rotation of the overlay. Once I've cleaned up the resource renaming code I'll be able to plug the approach into fop-pdf-image - hopefully with few hassles. fop-pdf-image already contains most of the pdfbox-to-fop adapter code required. This won't help with the duplicate fonts (and thus won't help with file size) but it might help with the RIP crashes. Here's hoping. See the PDFBox-dev mailing list for the Overlay.java patch. -- Craig Ringer
Re: Fwd: fop-pdf-image and fonts; as requested
On 07/03/12 16:35, mehdi houshmand wrote: * Insert the concatenated content streams from the source PDF into the output content stream. They must be surrounded by appropriate graphics state save and restore operators and any necessary scale/position operations to place the content where you want it. HA HA!! Incorrect! If you look into the nooks and crannies of the PDF spec, you'll see that it's possible to use content stream arrays for the /Page content stream. Sure - that's why I said the content stream(s) had to be concatenated before insertion, because the input might be an array of content streams. I was thinking that to get reliable results when overlaying you'd have to wrap the whole series of drawing operations from the input in state saving/restoring operations, etc, thus having to concatenate the streams before wrapping. In retrospect, that's not true; one can just as well wrap each copied content stream in state save/restore and scale/position operations. It might even be possible to get away without a graphics state save/restore, but I don't think so. IIRC multiple content streams are treated by the reader as if they were one concatenated stream, so you still have to save/restore gstate to ensure the inserted stream doesn't mess up anything after it. I'll have to check this in the PDF ref, though. I'll leave exploring that to you, but basically it makes overlaying pages much much simpler. In related news, PDFBox does just that!! What we did (and it's super hack, but it worked) is if there we pages with both PDF-image content and FOP generated content, we'd get FOP to generate the content without the PDF-image and just overlay the pages. Best of both worlds!! (Though the purist in me is very much aggrieved) Urk, that's horrible! Effective, though, I expect. Presumably you still have to translate scale and rotate then clip the content stream you're overlaying, though. [snip] The more you describe your problem, the more it sounds like you need to do exactly what we did, but just to be sure, I thought I'd explain how we got there. Assumptions are a dangerous thing and I've probably made some about your issue too. Given what you've described I'm inclined to agree that the cause of the issues is the same. I suspect we're facing the same problem or very similar problems, in which case my RIP crash issues may not be font related after all. I still want to fix the font issues because, rip crash causing or not, the font subset duplication produces massively bloated PDFs that are totally unsuitable for online distribution. It's kind of disheartening to learn that the RIP crash issues are probably something else entirely, since I thought I at least had to solve only one problem. As for doing exactly what you did: I'd certainly be very interested in seeing your PDFBox code for loading the fop-generated PDF, finding the placeholders, and overlaying the PDF graphics over them. In particular I'd like to see how you handled scaling/translation/rotation/clipping when drawing the copied streams, and how you handled state saving and restoration. I can see overlaying over placeholders in post-processing as a really useful interim solution, though eventually I'd like to enhance fop-pdf-image to do that overlaying directly. The really frustrating thing is that sometimes using an XObject will be exactly the right thing to do, because the PDF being embedded actually appears multiple times in the document. The solution to this links neatly into the font de-duplication issue: fop image plugins need a way to store per-render-run information, in this case so they can determine how often an image occurs in a document during the preload run and make an appropriate decision about how to embed it. I'm not sure it's even necessary to have an image plugin api change for this; plugins should be able to store enough information in a WeakHashMapFOUserAgent,... to figure it out, so I should be able to make fop-image-plugin use form XObjects only for pdf images referenced multiple times. -- Craig Ringer
Re: Fwd: fop-pdf-image and fonts; as requested
On 08/03/12 04:12, Vincent Hennebert wrote: Just my 2 cents on a particular detail... On 07/03/12 07:51, Craig Ringer wrote: On 06/03/12 18:49, mehdi houshmand wrote: snip/ So with that in mind, what exactly are you trying to do? Why are you using FOP to merge PDFs? I'm using FOP to produce documents containing a mixture of automatically typeset formatted text and graphics. Many of the graphics are PDF documents, and need to be PDF documents because they contain vector artwork and text that would lose quality and grow massively in size if embedded in rasterised form. Is SVG an option for you? That might save you a lot of trouble. Or if not readily available, that might still be less work. Alas, SVG isn't an option. We have a large body of work already in PDF (and EPS) format that we can't easily convert to SVG. Until I checked just now I didn't know that SVG even supported embedded fonts. Does fop actually support that and include embedded SVG fonts in output PDF? -- Craig Ringer
Re: Fwd: Google Summer of Code
of raster, text and bitmap data that should be included in the output document as efficiently as possible and without loss of fidelity. IOW, exactly what fop-pdf-image is for. -- Craig Ringer
Re: Fwd: Google Summer of Code
On 03/06/2012 07:29 PM, Chris Bowditch wrote: On 06/03/2012 11:08, mehdi houshmand wrote: Hi Mehdi, Font de-duping is intrinsically a post-process action, you need the full document, with all fonts, before you can do any font de-duping. PostScript does this very thing (to a much lesser extent) with the optimize-resources tag, as a post-process action. At least that is transparent to the user, but re-parsing the input is a sub-optimal solution as it incurs a performance penalty so we should investigate if there are alternatives first. I can't recall why the Postscript Paintewr/Renderer was architected in that way but thats a separate topic. At a guess, because PostScript is much less capable of non-linear references and access than PDF is. It's more expensive and slower to forward-reference resources because PostScript has to parse and execute all the rest of the document to find the resource it wants, while PDF just seeks to the object at the byte offset referenced in the xref table and reads only the object it requires. The requirements are perfectly clear: Given a set of input PDFs, XSL-FO, create a single merged PDF with a consistent and unduplicated set of fonts. Why would there be slight kerning differences if the assumption that the font name is unique holds true. Assuming the font name is unique is dangerous, since it's provably true that in the wild there are numerous subtly (and sometimes grossly) different fonts with the same name. The font dictionary contains glyph metrics information that along with the font name, slant, weight etc can be used to match the font rather more closely. For extra caution, checksums of subset glyphs can be done to make sure they're *identical*, but honestly that's unnecessary if the metrics match. If that assumption is wrong then I agree with what you say. Ultimately that should be down to the user though, they know their fonts, so they can decide whether to merge them or not via a setting in the fop.xconf. Your argument is not sufficient to say this approach should never be used. It brings a lot of benefit to users who know their font names are unique. It should be safe to do automatically and transparently by default, because only partially overlapping subsets of identical fonts should ever be merged. Anything else is a substitution not merging duplicate subsets, and has entirely different considerations because of the possibility of visible changes caused by non-matching metrics etc. -- Craig Ringer
Re: Fwd: fop-pdf-image and fonts; as requested
documents, and need to be PDF documents because they contain vector artwork and text that would lose quality and grow massively in size if embedded in rasterised form. I'm *NOT* trying to use fop to concatenate PDF pages, to impose PDFs, or any of that. It'd make very little sense to do that. Do you need FOP to do this work? I either need fop, TeX, or need to write my own document layout system. The latter would be insane - why implement text justification and flow algorithms, etc, when it's already well established in fop? Have you tried merging PDFs with PDFBox and seeing how that affects the RIP? I haven't, and it's worth a try. It'd produce a document containing many hundreds of small irregular shaped pages, as each input PDF is quite small. It'd certainly help confirm whether the issue was XObject form use, or whether it was font duplication. -- Craig Ringer
Re: Google Summer of Code
On 03/05/2012 09:35 PM, mehdi houshmand wrote: Because of the overwhelming popularity of this idea, I've created a link on the Wiki (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for the GSoC proposals. Things that come to mind for me: - PDFBox backend (probably ideal for GSoC, nice and self contained, great for someone who knows PDFBox and wants to learn fop's codebase); - CID fonts in PostScript (good for someone who knows PS and fonts, not necessarily XSL-FO so much); - Using automatic +- kerning, +- tracking *and* +- horizontal type scaling adjustment to better auto-fit text, involving support for font-stretch property. This touches on layout so it may not be practical for a 1st fop project, but may not be too bad since fop already adjusts tracking when justifying text. The key interest points would be *negative* tracking, kerning and (if nothing else works) glyph-scaling for tighter type-fitting where it's not desirable to break to a new line due to widow/orphan policy or because it'd create large holes. This is particularly important when long unbreakable words must fit a fixed width space. - PDF/X-1a with CMYK; - Anything in the proposed XSL-FO 2.0 feature list (though most of it won't be realistic for GSoC projects); - Merge fop-pdf-image and implement smart merging of font, profile, and image resources. I'm working on this one at the moment, but slowly and only amid other projects. -- Craig Ringer
Re: Google Summer of Code
On 03/05/2012 09:35 PM, mehdi houshmand wrote: Because of the overwhelming popularity of this idea, I've created a link on the Wiki (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for the GSoC proposals. You note font library extraction as a possibility there. I'd like to note another possible motivation for extracting the font library: to then potentially permit it to be merged with or replaced by pdfbox's fontbox, reducing duplicate work. -- Craig Ringer
Re: Implementing PDF Object Streams
On 27/02/2012 8:08 PM, Vincent Hennebert wrote: We would like to implement PDF Object Streams as defined in the PDF 1.5 Reference. In short, the structure tree would be stored inside a stream to allow for compression in the same way as the page content. What's the status of object stream support in PDFBox? Is it possible the feature is bettern implemented by adopting a PDFBox based backend? There's been long term planning talk of moving over to PDFBox as the underlying PDF support library. It'd massively simplify work with PDF-in-PDF embedding, reduce maintenance work, etc. Is it worth doing major enhancement work on fop's pdf library if it may go away in future? I'm struggling with getting fop and pdfbox to play well together at the moment as I work on enhancing fop-pdf-image to merge duplicate font subsets. The use of two different pdf libraries makes fop-pdf-image much more complex and makes working with fonts a lot harder. I'm sure it's not the only area where a pdfbox-based backend might be good. -- Craig Ringer
Re: Document and page callbacks for image handlers
On 21/12/2011 5:07 PM, Chris Bowditch wrote: FOP can't currently fully embed a font in PDF, so even if you had the source font available the code changes required could be extensive. For us, this approach isn't an option because we don't have the source font to register in fop.xconf and embed. Therefore I am interested in knowing what you've come up with in terms of merging subsets together to create 1 super subset. That in my view is the most difficult challenge in this problem. Resolving the problems with the cross references and the point at which IDs are assigned should be solvable with a little code refactoring. I'm sure one of the guys will speak up if that's not the case. As yet I haven't begun to tackle the actual merging of Type 1 or TrueType subsets into a single font. I've done the accumulation and merging of the widths arrays, but not the fonts themselves. I plan to make new minimum subsets from local fonts if they're available, and will try merging of actual embedded font files only if I can't get that to work or if I have time. I don't know font data structures well enough to want to try merging subset embedded font files if I can possibly avoid it. I've just finished writing and testing the code to accumulate information on each font as its encountered in a source PDF and merge it into a collection of font information keyed by (FontName,SubType,Encoding). I compare the metrics to ensure that the fonts are really compatible and if they are I merge the widths arrays and startchar/endchar to produce information. At the end of the run I can now produce a font dictionary and font descriptor for the minimum subset required to satisfy the requirements of each of the embedded documents using that font. I can report on font usage, glyph usage within each font, and potential size savings, but I don't yet have it actually replacing the fonts. That's what I'll be working on today. First I'll be trying to use fop's font embedding mechanism to do it, which will require adding some callbacks to fop's pdf output to run code just before the resource dictionary is written out so I can inform fop of the required glyphs. I'll be delaying the writing of all the xobject resource dictionaries until after the fop resource dictionary is written so I know the fop font oids and can embed them in the xobject resource dictionaries. With luck I'm hoping I'll be able to write the minimum subset but I haven't looked into fop's font embedding code in enough detail to be sure exactly what I can do or how, so I'll be going delving shortly. If this approach works the next step will be to allocate font object IDs early so I don't need to waste memory on delaying xobject resource dictionary writes and so I can avoid writing keys for fonts fop its self never uses to fop's resource dictionary. Yesterday I attempted to unembed base-14 fonts during import of PDF content, so I'd recognise fonts like Helvetica in type1 and replace them with a font dictionary for a base14 font reference rather than the embed dictionary. Acrobat choked on the result for reasons I'm not entirely sure of as it looked OK structurally. I'm not sure quite what was wrong, but hope to have more luck with re-embedding rather than replacement with a base-14 font. On a side note, I also need to enhance the font info collection code so it keys on more of the font metrics. Currently the first font with a given (FontName,SubType,Encoding) tuple is registered for that key, and if subsequent fonts with the same key but incompatible metrics are encountered they're copied over verbatim exactly as is currently the case. Expanding the key to cover the font bbox, ascent and descent etc will help solve that and won't be hard, I'm just leaving it until I have a proof of concept font re-embed working. -- Craig Ringer
Re: Difference between RenderingContext and RendererContext
Thanks Vincent, much appreciated. It turns out I can't use the render context for what I'd hoped to anyway. I'm having to use a weakhashmap keyed on the FOUserAgent to associate image-handler private data with a rendering run. I'm putting together a proposal and patch to add interceptors around IFDocumentHandler calls, with a service loader based mechanism for fop image handlers, document extensions etc to register handlers. This will - if accepted - let plugins and exts do extra work at various document production phases and provide a cleaner way for plugins and exts to keep per-render data. It will also provide an easy way to support checking for cancellation of rendering runs fairly frequently. The main problem is that I don't see how I can do it without repeating a line of interceptor call code at the start and end of every IFDocumentHandler method implementation, breaking source compatibility for external subclasses of AbstractIFDocumentHandler, or using a tool like JavaAssist. I'm inclined to break source compare for AbstractIFDocumentHandler subclasses and just not propose inclusion in 1.0.x stable but I'd like your view on this. Note that I can't use the user-oriented event system - it doesn't give any access to renderer/handler state and probably shouldn't, plus it doesn't offer low level enough events and again probably shouldn't. If it were targeted at fop 1.1.x only would I possibly be able to use Java 5 in a patch? It'd make this one much cleaner, particularly being able to use enums . On Dec 20, 2011 11:34 PM, Vincent Hennebert vhenneb...@gmail.com wrote: Hi Craig, On 16/12/11 13:29, Craig Ringer wrote: Hi all While reading over the pdf-image extension and fop code, I'm having a bit of an interesting time figuring out the difference between a few things and was hoping for a very brief pointer. I'm not quite sure what differentiates org.apache.fop.render.RendererContext from org.apache.fop.render.RendererContext . The JavaDoc comments don't really differentiate them and they look quite similar. This is hard to tell. I could trace RendererContext as far back as 2003, while RenderingContext was added in 2009 with the new XML Intermediate Format. I don’t know why RenderContext was not deleted or retrofitted back then. Given that RenderingContext is more recent, is an interface and has many more implementations than RenderContext has sub-classes (actually only one in the AFP code), I’d say it’s a safe bet to go with RenderingContext. This isn't helped by the fact that the pdf-image extension provides two image handlers - one implements PDFImageHandler and takes a RendererContext, while the other implements ImageHandler and takes a RenderingContext. They seem to do much the same thing. Is this a case of old-backwards-compat-code meets new-code? If so, which should I target for future work? *headscratch* -- Craig Ringer HTH, Vincent
Re: Document and page callbacks for image handlers
- A clean way to associate data that's private to the image processing plugin with a particular rendering run so I can access it across multiple invocations of the plugin; and For anyone else who needs this later: There doesn't appear to be any especially nice way to do this with FOP's current image handler API, as there's no general-purpose map on the user agent for image handlers to stash their data in and nothing like that is passed as a param to the image handler calls. The hints mechanism can pass data from a preloader to a loader for the same image, but it can't be used to pass data between image loaders. What I've landed up doing is keying a WeakHashMap off the FOUserAgent for the rendering run, as obtained via the RenderingContext passed to ImageHandler.handleImage(...). So long as lookups and insertions on the WeakHashMap are synchronized this is safe and will release the image handler's per-render information when the FOUserAgent is discarded at the end of the rendering run. I'm now able to accumulate font usage information from the PDFs I examine as I embed them and build a list of which fonts are used. I can combine width arrays and first/last char listings to determine which glyphs are required if the font is to be embedded as a subset. - How to append some additional PDF objects after the last page is emitted but before the PDF document trailer and final xref table(s) are written out. For anyone else looking at this now or later: It's possible to allocate a PDFObject and request that it be written out at the end of the document. PDFDocument.outputTrailer(...) writes objects added to the trailer list. Those objects were allocated via the factory where they were given an object ID, but were then passed to addTrailerObject(...) to request that they be written out at the end of document production. If I ever start producing my own combined font subsets from the original subset fonts in the input PDFs, this is probably how I'd insert the combined font subset object. If I'm restricting font combining to fonts where fop has an original font file and using fop's font subsystem the above would require too much duplication and make it hard to avoid embedding fonts twice (once for form xobjects, once for main content). Instead I need to mark a font as used in fop's FontInfo for the rendering run so fop writes it out, and I need to obtain the font object's PDF object ID so I can write forward references to it in the XObject forms' resource dictionaries. The problem here is that fop doesn't assign fonts an object ID until very late in writing. The first reference to font objects is from the resource dictionary, and fop only writes one of those - it is shared between all pages and written out just before the trailer. Since fonts are written out with the resources dictionary and don't usually need object IDs until the resources dictionary has to reference them there's no way to get their object IDs earlier in PDF production. This changes when we need to write private resource dictionaries for embedded form xobjects. I'm looking at forcing early embedding of fonts with direct makeFont(...) calls. This'll work so long as I'm happy embedding whole fonts, but will prevent fop from subsetting the font for its own use and prevent me from subsetting it for xobject forms. Alternately, I could defer the writing of the xobject form resource dictionaries till the end of the document so I didn't need to know the font object IDs early - but I'd still need a way to write them *after* the main fop resource dictionary. If I wanted to subset then I'd also need a hook for just before fonts were written out by fop to adjust the glyph width tables. I don't see any way around this without some kind of PDF renderer listener for image handlers etc to use. I'll try to put together a proof of concept that embeds whole fonts if the font is found in a pdf form xobject, de-duplicating references so all pdf form xobjects that use that font reference the same one. Fop will use the same font since it knows about it and has stored it in the used fonts map, so the only problem is that the whole font is embedded rather than a subset. Anyone working on the same thing, please feel free to drop me a note. -- Craig Ringer
Difference between RenderingContext and RendererContext
Hi all While reading over the pdf-image extension and fop code, I'm having a bit of an interesting time figuring out the difference between a few things and was hoping for a very brief pointer. I'm not quite sure what differentiates org.apache.fop.render.RendererContext from org.apache.fop.render.RendererContext . The JavaDoc comments don't really differentiate them and they look quite similar. This isn't helped by the fact that the pdf-image extension provides two image handlers - one implements PDFImageHandler and takes a RendererContext, while the other implements ImageHandler and takes a RenderingContext. They seem to do much the same thing. Is this a case of old-backwards-compat-code meets new-code? If so, which should I target for future work? *headscratch* -- Craig Ringer
Document and page callbacks for image handlers
Hi I'm interested in implementing merging of duplicate subset fonts in Jeremias's fop-pdf-image extension. I'm working with fop 1.0, and i'm having trouble with two things: - A clean way to associate data that's private to the image processing plugin with a particular rendering run so I can access it across multiple invocations of the plugin; and - How to append some additional PDF objects after the last page is emitted but before the PDF document trailer and final xref table(s) are written out. I think I've figured out how to associate private image plugin data with the document being rendered (using the RenderContext as a key to a static WeakHashMap in the plugin) but it seems like a pretty ugly approach, so I'm hoping there's a better way I'm missing. More problematic is that I need to emit additional PDF objects after the last page - or at least the last image - has been rendered. I can't see any obivous callbacks to fop image plugins to notify them when a new document is started, a new page started, a page finished or a new document finished. Fop seems to have an event system but it seems to be oriented toward trapping error conditions and problems rather than doing extra work at certain processing phases. In particular, PDFEventProducer doesn't seem to be useful for this. Any advice on how to hook document completion after the last page is written but before the pdf trailer and other closing pdf structure are written, so I can write out some indirect objects I've only written indirect references to so far? Sorry if these are somewhat stupid questions. I'm very new to fop's codebase and I'm still getting my head around it. -- Craig Ringer
Document and page callbacks for image handlers
Hi I'm interested in implementing merging of duplicate subset fonts in Jeremias's fop-pdf-image extension. I'm working with fop 1.0, and i'm having trouble with two things: - A clean way to associate data that's private to the image processing plugin with a particular rendering run so I can access it across multiple invocations of the plugin; and - How to append some additional PDF objects after the last page is emitted but before the PDF document trailer and final xref table(s) are written out. I think I've figured out how to associate private image plugin data with the document being rendered (using the RenderContext as a key to a static WeakHashMap in the plugin) but it seems like a pretty ugly approach, so I'm hoping there's a better way I'm missing. More problematic is that I need to emit additional PDF objects after the last page - or at least the last image - has been rendered. I can't see any obivous callbacks to fop image plugins to notify them when a new document is started, a new page started, a page finished or a new document finished. Fop seems to have an event system but it seems to be oriented toward trapping error conditions and problems rather than doing extra work at certain processing phases. In particular, PDFEventProducer doesn't seem to be useful for this. Any advice on how to hook document completion after the last page is written but before the pdf trailer and other closing pdf structure are written, so I can write out some indirect objects I've only written indirect references to so far? Sorry if these are somewhat stupid questions. I'm very new to fop's codebase and I'm still getting my head around it. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/
Difference between RenderingContext and RendererContext
Hi all While reading over the pdf-image extension and fop code, I'm having a bit of an interesting time figuring out the difference between a few things and was hoping for a very brief pointer. I'm not quite sure what differentiates org.apache.fop.render.RendererContext from org.apache.fop.render.RendererContext . The JavaDoc comments don't really differentiate them and they look quite similar. This isn't helped by the fact that the pdf-image extension provides two image handlers - one implements PDFImageHandler and takes a RendererContext, while the other implements ImageHandler and takes a RenderingContext. They seem to do much the same thing. Is this a case of old-backwards-compat-code meets new-code? If so, which should I target for future work? *headscratch* -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/
Re: OpenType font library [was: Re: How much work is needed for FOP to support OpenType fonts?]
On 19/01/11 16:35, Simon Pepping wrote: I take this discussion to express my worries that FOP needs to create its own support for fonts, among which Open Type Fonts. FOP's core task is the layout and printing of FO files. If FOP could rely on good font libraries, that would make our code base so much smaller and our development tasks so much easier. If I am not mistaken, Firefox does a fairly good job at representing Indic scripts. Do they use a generally available library? There are several libraries for complex scripts, including the commonly-used libbidi and pango libraries. All the widely used ones that I know of are C- or C++ libraries. While Java can use C and C++ libraries, a Java Native Interface (JNI) layer must be written. Further, JNI code and the libraries it uses must be compiled separately for each supported platform and architecture, making packaging and deployment of the Java code a ***PAIN*** unless all the native code parts are shipped as part of the JDK/JRE. Even allowing for the issues with complex/bidi libraries, Apache FOP must also handle the OpenType font format its self, including support for font subsetting and embedding. That's way outside the scope of complex script and bidi libraries. While library code exists to help with OpenType handling, I'm not aware of any even C or C++ libraries that provide useful, fairly abstracted facilities for subsetting and embedding without tying them in to related PDF libraries. While Java its self can use OpenType fonts, it doesn't expose the details of its OpenType parser etc to Java applications. In any case, it may use the platform's font support rather than bundling its own. Java apps need to provide their own OpenType format handling if they want to do more than just use the fonts with Graphics2D and the other Java rendering APIs, because there's no way to get to the guts of the fonts loaded by the JVM. Ideally, Apache FOP could be built on top of: - A low-level Java font format and parsing library that can identify fonts, enumerate tables and glyphs, detect features, etc. - A low-level Java PDF library that handles the PDF document structure, xref and indirect object management, PDF data structure representation, direct-to-disk streaming of big images into PDF object streams, etc etc. - A Java library that uses both of the above to provide features for PDF embedding of OpenType and TrueType fonts. Unfortunately, AFAIK *NONE* of them exist, or at least are used, at present. Fop seems to have its own PDF output code and own font handling code. I don't see any obviously advantagous 3rd party replacements for the font handling code, and most of the 3rd party PDF engines (like iText) appear to be a bit limited when you want to insert your own low-level PDF content stream data, objects, etc to implement features not supported directly by the PDF library you're using. I was looking into this a little myself while checking to see how hard it'd be to implement /DeviceCMYK and /ICCBased colour in FOP. Because of the way FOP stores and manages colour internally, the answer appears to be very at present, especially if you want to support PDF/X requirements and handle CMYK passthrough. -- System Network Administrator POST Newspapers
Re: color issues [was: OpenType font library]
On 19/01/11 19:13, Jeremias Maerki wrote: Craig, you might want to try out the color branches for which I've just started to vote to merge it into trunk. The color branch adds Named Color (separation, spot color) support and CIE Lab support. However, ICC and device CMYK colors should already work in FOP Trunk/1.0. Could you maybe elaborate on your problems there? Hmm. I seem to have missed the cmyk() function in fop, and less excusably the rgb-icc() function in XSL-FO proper. Maybe it was CMYK and ICC tagged images I was having issues with? I'll look into it again, it's been ages since I was looking at that, and I was on fop 0.95 when investigating that stuff so it may have changed since then anyway. -- System Network Administrator POST Newspapers
Re: offo in maven [was: Re: DO NOT REPLY [Bug 49881] [PATCH] add maven build support]
On 9/09/2010 3:00 PM, Simon Pepping wrote: I found offo in maven central: http://repo1.maven.org/maven2/net/sf/offo/fop-hyph/1.2/. I did not put it there. Hmm. That makes me officially blind. Thanks :-) Ah well, it served as a useful example of the methods. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: DO NOT REPLY [Bug 49881] New: [PATCH] add maven build support
On 7/09/2010 1:52 PM, Jeremias Maerki wrote: Well, Ivy has one fundamental problem in common with Maven that many regard as a great feature: the repository. Numerous times, I couldn't get a Maven build to complete successfully because some artifact was temporarily or permanently unavailable. First: I'd like to note that none of the following is meant to sound like some kind of ra ra ra you should use maven and only maven, maven is the truth and the light. It's just a tool, and like all tools has things it's good for and things it's not so good for. That said, I've never had issues with remote repositories - I routinely use sonatype nexus (jboss) repos, Central, java.net, and a couple of private repositories. I guess it helps that once files are fetched by maven and cached in the local repository, that's it. Unless you change a dependency's version or use snapshot versions, there's no more network access. There's always the option of doing the same thing you currently do with ant - bundle copies of the dependencies in shipping versions or maintain a separate 3rd pty dependencies repo under version control. I guess I don't really see the difference. Here I keep a common repo under version control, but that's mostly to save download time on big files, and is exactly the same thing I do for non-Maven resources like JDK snapshots. It would insulate me a bit from transient failures in remote repositories, though. (I do wish that Maven would print a warning and use the last-downloaded -SNAPSHOT version if it didn't have network access and snapshot updates were enabled, though. It's the only area where connectivity requirements do cause me issues.) And how many times did a Maven/Ivy build download half the Internet just to build a small project? Generally only if it's misconfigured, or that small project uses plugins/libraries with a lot of dependencies. In the latter case, you're going to need to get them one way or the other. My Eclipse's Maven and Ivy plug-ins are long uninstalled because of the trouble they caused. Aaah. I don't use Eclipse - and given the nature of my experences with it when I've tried using it for something, I wouldn't be surprised by problems. I use NetBeans for most work, and the command line where convenient. I don't suppose you were relying on any SNAPSHOT version plugins or libraries? Because if you were and you had snapshot updates enabled (the default - unfortunately IMO) then I can certainly see it seeming like it wants to download the internet whenever you run a build. Another problem of an external repository is the lack of license management. ASF projects have clear requirements what kinds of dependencies are allowed. If you can't control transitive dependencies based on a license policy you're bound to run into a problem there. Now that can be a problem. Again, though, I'm not sure how different it is to a 3rd party library you use bundling libraries of unknown licensing as dependencies. Either way, you have to check. release Maven artifacts won't change dependencies without a version change, and you have to do that kind of checking whenever you update anything, maven-based or not. I can check out (or extract) FOP and build at least a basic version locally with no outside connection. I like that and would like it to stay that way. The same is true with Maven. It doesn't have to try to download the Internet, nor does it need 'net access for builds. I routinely do (re) builds on my laptop while disconnected. I have the required artifacts in my local ~/.m2 repository already, and that's all I need. If I was using an Ant project I'd have to have obtained the required dependencies to put on the classpath somehow; same deal. Whether I populate my ~/.m2 from Internet repositories, or check out a private pre-populated maven repo from version control, I still have to obtain it somehow. That said, I do find that the way it doesn't tend to include most of the core plugins in the initial Maven download - and therefore fetches them when you first do a build - to be annoying. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: DO NOT REPLY [Bug 49881] New: [PATCH] add maven build support
On 7/09/2010 4:40 PM, Jeremias Maerki wrote: I guess we're in a religious dispute here, like PC vs. Mac. So we can't expect to reach a consensus. Well, certainly a discussion of preference. I know it gets religious for some Java folks, but myself I don't mind too much so long as nobody tries to force their choice on me. I can use Maven without having to care what others use or force it on them. I'm only weighing in on this discussion to say that I'd like the option for maven builds if it doesn't get in anyone else's way, and address some possible misunderstandings about maven. I like dealing with maven in projects because for me it is a known quantity and imposes some consistency on projects that I personally like. OTOH, I manage ok if a project doesn't use maven, at least so long as I don't have to wrangle the guts of its build system. Anyway, I won't to stand in the way if something is added to FOP that can help some users. [snip] just because Maven can't include a simple JAR that is not in a repository. Not strictly true. One option is to use scopesystem/scope with an explicit path to the jar. Maven doesn't have a wild-card include everything under lib/ though, and using system scope to fudge in local depencies is a bit of a hack. http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies Usually what you'd do if you have a jar you want to use - but no repo or pom for it - is drop the jar you want to use into your local ~/m2/ (or wherever you keep your local repository, ie download cache) then declare a dependency on it in your pom.xml. This is within a repository but it's only your local repo, it doesn't involve any network access or anything except putting a file in a particular place. Maven will look for the dependency in a location defined by the repo layout. So if I declared dependency groupIdlocal/groupId artifactIdsomejar/artifactId version2.2/version /dependency ... then it'd look for local/somejar-2.2.jar within my local repository. If I put the jar where it should be found, no problem. I don't personally find that to be any worse than dropping everything in lib/ ... and I find it makes it a LOT easier in the long run to let mvn take care of the mess of secondary (transitive) dependencies involved in using things like Hibernate. (OK, so maven does whine annoyingly about not being able to find the pom.xml for the artifact, which bugs me - but it works fine nonetheless). But I consider Maven viral as we're seeing here. Due to its inflexibility, projects are almost forced to adopt it to keep everyone happy, I can't speak for the obsolete Maven 1.x, but that's not true of 2.x . To keep everyone happy it *does* help to publish artifacts to a maven repository (be it Central or somewhere else) but there's no need to get Maven anywhere near your builds if you don't want to, and there's no need for the people maintaining the project and doing the development work to have anything to do with pushing project releases to maven central. If you *do* want to create and push maven artifacts yourself but don't want to use Maven in builds, a Maven artifact can be created with the cp command and a text editor, or with an Ant task to spit out a suitable generated pom.xml . No biggie. You can use Maven builds with jars not created or managed with Maven, you can use Ant to produce Maven artifacts, and you can use Ant to consume Maven-produced artifacts. It doesn't really force anything on you at all. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: DO NOT REPLY [Bug 49881] New: [PATCH] add maven build support
On 7/09/2010 7:22 PM, Benson Margulies wrote: I've never seen a message to one of the mailing lists complaining that connectivity issues were making people miserable. Why? You need connectivity to update from svn. Then you need connectivity to run a build. ... and to get any libraries or other dependencies if you don't already have them locally. Just like with Maven. BTW, I suspect many people who have trouble with Maven's apparent net access requirements don't know about mvn dependency:go-offline and mvn -o for offline operation that doesn't try to check snapshot repos etc. mvn -o is kind of hard to miss, but people seem to anyway; the go-offline goal is rather less obvious but really handy. Meh. I'd like to see maven support in fop, but I'm not working with fop's code much at all so it's hardly something I can claim any say in. Maybe I should bash together an ant task to spit out Maven artifacts after a build, though, to make it easier to use fop's existing build tools to integrate with maven. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Font Glyph?
On 27/07/10 08:03, Glenn Adams wrote: Let's see if I have any luck obtaining the last resort font for direct inclusion in FOP. The URW free fonts may have a suitable license. They're distributed with GhostScript among other things. As they cover the base set of PostScript fonts, they could be ideal. The BitStream Vera family might also be useful, though they don't have the same metrics and appearance as common base fonts. -- Craig Ringer
Re: Intermediate Format (IF) with placeholder / marker
On 13/07/10 19:01, Adam Kovacs wrote: Hi there, I would need an XML element from any namespace which I can use in the XSL and is not interpreted by FOP and goes into the Intermediate Format (IF) output as it is. I want to use this element as placeholder in the FOP_IF, and in a second step I can modify the FOP Intermediate File for my needs. I've been doing this using the area tree (AT) output format, using named blocks as suggested already. It seems to me like the IF is much less useful for modifying before output, because it doesn't seem to preserve identifiers, and omits quite a bit of information you might want. In my case I found that there was no XML level way to tell what blocks were contained in which columns of a multi-column flow; I'd have to do it with geometry using the IF. In the AT output, it's obvious from the XML nesting - each column is a flow within a span for the page's content. So... consider using the area tree output instead. It might be a better fit for your needs. -- Craig Ringer
Potential contract project
Hi folks I'm posting to -dev because I've hit a wall with FOP that I'm interested in paying someone to knock down. For details on the issue, see the thread Distributing vertical space in a column while repeating column headings in -users, starting with message-id 4c1f97b7.9040...@postnewspapers.com.au . Archive thread begins here: http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-users/201006.mbox/%3c4c1f97b7.9040...@postnewspapers.com.au%3e There are several possible approaches to solve the issue I'm having, but the simplest looks like implementing vertical justification of space within one-column tables, by respecting and applying elastic space-before on blocks within table cells in one-column tables. An alternative option would be to add conditional output to blocks, so that a certain block could be suppressed (or only shown) if it was the first object after a break. But I suspect that'd be a lot more complicated, and would require extensions to the standard, whereas vertical justification of elastic space within tables would not. OTOH, it'd be really handy. In the end, any method that'll let me repeat headings for sections of flowed listings when those sections are broken across columns on a multi-column page, AND vertically distribute space within those columns, would do. Right now, with fop I can do one or the other but not both. If you feel you know the layout engine well enough to tackle this, think it's something reasonable to do, and can do it well enough that your changes can be committed into fop trunk, I'd be interested to hear from you with an estimate of how much work it'd take you over how long, what approach you'd take, and how much you'd want to charge for the work. Be realistic about time/effort, not optimistic. The rights to any changes would be licensed under the same terms as fop its self, with copyright held by the author, and would want to seek inclusion of the changes into fop's trunk. In fact, that'd have to be condition of satisfactory completion, since I *really* don't want to be in the job of maintaining an external patch against complex core bits of fop! Anyone interested? -- Craig Ringer System and Network Administrator POST Newspapers