DO NOT REPLY [Bug 40271] [PATCH] auto table layout -- dirty draft
https://issues.apache.org/bugzilla/show_bug.cgi?id=40271 carsten.pfeif...@gebit.de changed: What|Removed |Added CC||carsten.pfeif...@gebit.de -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: FOP and large documents: out of memory
Hi Stephan, I’m not sure I would invest any energy into improving the CachedRenderPagesModel (-conserve option). It doesn’t look like the right approach to me, and like you noticed it doesn’t even work out of the box currently. Why store the Area Tree on disk? Why not directly render it into the final output format? If that latter supports out-of-order pages, then that’s great; Otherwise we may as well store the final pages and order them later on when the document is complete, instead of storing them in a half-finished area tree format. As to pages that hold unresolved references, so can’t obviously be rendered yet: there usually aren’t that many of them that would make the area tree solution vastly superior to a final format one in term of memory consumption. Those ones could be kept in memory until all the references they hold are resolved. Also, the handling of forward references is currently less than optimal. The resolution is made in the area tree instead of looping back to the layout engine. ATM, a page-reference is rendered using a placeholder string (‘MMM’), and that placeholder is later replaced with the actual value (e.g., ‘5’). This is fine for constructs like tables of content, but may produce ugly results if the page-number-citation is inside a paragraph, ruining the even spacing. What’s the point of implementing a high-quality line-breaking algorithm if its output is spoiled by a poor handling of page citations? I think the two-pass approach is the best long-term solution, although obviously less trivial. One challenge is to detect a possible infinite loop. For example: referenced item is at the beginning of page IX, reference is updated to IX, which takes less room than MMM, so the document is re-laid out and referenced item is moved to page VIII; Reference must be updated again, document is laid out again and referenced item end up on page IX again. And again, and again... One possible workaround for your use case is to generate your document once with a dummy TOC and just “Page X” into the intermediate format; Parse it to get the total number of pages and the page numbers for each element of the TOC; Re-generate it with hardcoded values for page references. HTH, Vincent Stephan Thesing wrote: Hello, as is well-known, FOP can run out of heap memory, when large documents are processed (http://xmlgraphics.apache.org/fop/0.95/running.html#memory). I have the situation that the documents I have to process mandate a footer on each page that contains a page X of Y element and a TOC at the beginning of the document, i.e. FOP cannot layout the pages until all referenced page-citations are known, which is after the last page of the document. When page content is quite complicated (e.g. 2000 pages mostly full with tables), the heap space does not suffice to hold all pages until all references can be resolved, thus FOP aborts with out-of-memory. Since increasing the heap space does not always work (3 GB heap space was required in one example), I need a better solution for this. 1. -conserve option One alternative would be the -conserve option, which serializes the pages to disk and reloads them as needed. Although slow, this definitely would be a solution, if it worked, which it doesn't: Our documents include graphics (SVG, PNG), and the serialization with -conserve throws an exception, because some class in Batik is not serializable (e.g. SVGOMAnimatedString IIRR), thus the page is missing, causing FOP to abort later. Thus, Batik would have to be fixed for this. 2. Two passes Since the pages are kept because of unresolved references, one could do the same as e.g. LaTeX always did: process the document twice. In a first run, pages are discarded after layout, only the references for page-citations are kept and at the end reused for the second pass (when all pages for the citations are finally known). For the second run, these id-refs are initially loaded and no pages have to be kept. This would require more changes in FOP (and should definitely be made optional obviously). I would appreciate any comments or other suggestions ! Best regards Stephan
Re: change bar status
Stephan Thesing wrote: Dear all, sorry for the long delay in answering. Hi Stephan, thanks for looking into this! Change bars would be a useful feature addition to FOP. Having had some time in the last days to actually continue working on the change bar stuff: + parsing and validating the fo:change-bar-begin and fo:change-bar-end elements works (including properties) + attaching to all fo: elements that are between fo:change-bar-begin and fo:change-bar-end elements the vector of change bars that affect the element is implemented If I understand the FO standard right, then for each area generated by a FO object under the influence of a change bar, an area (with correct border-style/width/color... settings) has to be created as an xsl-absolute area and placed as determined by the change bar style relative to the column or page edge. For areas generated in the body-region, these additional areas are children of the flow area, for after/before/start/end they are children of the respective area region. If I understand the FOP code right, the areas are actually constructed by the various layout managers in addAreas(). That is my understanding too. Now, when creating the change-bar areas, one could for every area thus constructed, add a change bar area at the correct position in the area tree. But, it is desireable to merge change bar areas, because many areas will simply be the same or be adjacent to each other down the page side and should thus be represented as a single area representing the union of these areas (as far as possible). Another point is that the change-bar areas have a z-Buffer trait that decides visibility of overlapping change bars: I am not sure how to handle that yet. Desirable yes, but I'm not sure it is essential. I would try to get it working without merging first and then evaluate if the solution is acceptable without merging and if so tackle merging as a later project. We can certainly help with the evaluation if you post into a bugzilla entry. Would it be best to really add the change bar areas when the originating areas are constructed and merge on the fly or would it be better to search for all change bar generating areas when the page is finished and then to perform the merge? Currently missing altogether are unit tests for the code I have added or changed. In short, we are getting somewhere. Thanks for the update. It is encouraging. Chris
Re: svn commit: r898845 - /xmlgraphics/fop/trunk/test/accessibility/pdf/
Hi Simon, Simon Pepping wrote: On Wed, Jan 13, 2010 at 05:17:03PM -, vhenneb...@apache.org wrote: Author: vhennebert Date: Wed Jan 13 17:17:01 2010 New Revision: 898845 URL: http://svn.apache.org/viewvc?rev=898845view=rev Log: Updated reference accessible PDF files. Old ones had Apache FOP Version SVN branches/Temp_Accessibility as Creator and Producer values. New ones have Apache FOP Version SVN trunk. This was causing spurious differences when testing PDF accessibility. Where is the version property set? In FOUserAgent, producer field is set to Apache FOP Version + Version.getVersion() and returned by the getProducer() method; That method is called, among others, in PDFRenderingUtil.setupPDFDocument. Something similar is done (I suppose, haven’t checked) for Creator. Is that what you asked for? Vincent
Re: FOP and large documents: out of memory
On 13 Jan 2010, at 22:37, Stephan Thesing wrote: On 13 Jan 2010, at 21:27, Stephan Thesing wrote: ... Our documents include graphics (SVG, PNG), and the serialization with -conserve throws an exception, because some class in Batik is not serializable (e.g. SVGOMAnimatedString IIRR), thus the page is missing, causing FOP to abort later. Thus, Batik would have to be fixed for this. I think FOP can be 'fixed' for this too. If that is really the only class that is causing trouble, then FOP could make a serializable subclass for it, and use that in the area tree, instead of Batik's default non-serializable implementation. Unless Batik really needs it, why fix it there? I don't think that can work, as that class is used in elements nested in classes of Batik that represent the SVG. I.e., FOP never instantiates it, but the Batik code does somewhere along OK, I see... Just noticed that my idea for 'subclassing' is probably not entirely what I meant... Suppose, for the sake of the argument, that String is not serializable, but we'd need it for some reason and the Java vendor does not want to alter their implementation. What could be done, is store only the info needed to create a new String upon deserialization. Serialize the char-array, and re-instantiate the String instead. I was thinking something similar should be possible here, but if it is really that far out of FOP's control, then never mind. Regards Andreas Andreas Delmelle mailto:andreas.delmelle.AT.telenet.be ---
Re: ConcurrentModificationException error
On 10 Dec 2009, at 03:24, Anil Pinto wrote: Hi Anil (Didn't see a response for this one come in, so far on fop-us...@... Apologies if the reply comes a bit late.) We have FOP (0.95) embedded in a multithreaded environment to create many PDFs almost simultaneously. We hav been using this configuration for 6 months plus now. I noticed the following trace for the first time and it caught my attention, as I thought we have followed all the multithreaded requirements required by FOP. It is pointing to a bug in FOP, due to a slight oversight in making use of java.awt.ICC_Profile, IIC. java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at sun.awt.color.ProfileDeferralMgr.activateProfiles(ProfileDeferralMgr.java:75) at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:756) at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:996) Checking the Javadocs, there is no mention anywhere of the multi-thread (un)safety of ICC_Profile or the call to getInstance(). So, I think we can only safely assume that this means it is unsafe. at org.apache.fop.pdf.PDFICCBasedColorSpace.setupsRGBColorProfile(PDFICCBasedColorSpace.java:140) Seen that it is a static method calling another static method, the chances of anything bad happening are very slim, but so you stumbled upon it. :( Seems a perfect example of a race condition, though: you mean this is the first time in all those 6 months that this error occurred? Very slim indeed, then! As for the good news (I hope I am correct about this): FOP can solve this easily, either by making setupsRGBProfile() a synchronized method, or by performing only the call to ICC_Profile.getInstance() in a synchronized block. My preference goes in the direction of the latter, as that limits the synchronization overhead to the single call into the AWT library, which is causing the issue. The rest of the method appears safe for concurrent runs, at first glance. The (minor) downside is that we would have to introduce a new static final to synchronize the calls on. Very quick patch below (vs current trunk; don't know if it can be applied to 0.95 without small changes...). HTH! Regards, Andreas --- Index: src/java/org/apache/fop/pdf/PDFICCBasedColorSpace.java === --- src/java/org/apache/fop/pdf/PDFICCBasedColorSpace.java (revision 679326) +++ src/java/org/apache/fop/pdf/PDFICCBasedColorSpace.java Wed Jan 13 20:29:07 CET 2010 @@ -34,6 +34,8 @@ private PDFICCStream iccStream; private String explicitName; +private static final Object _S = new Object(); + /** * Constructs a the ICCBased color space with an explicit name (ex. DefaultRGB). * @param explicitName an explicit name or null if a name should be generated @@ -137,7 +139,9 @@ InputStream in = PDFDocument.class.getResourceAsStream(sRGB Color Space Profile.icm); if (in != null) { try { +synchronized (_S) { -profile = ICC_Profile.getInstance(in); +profile = ICC_Profile.getInstance(in); +} } catch (IOException ioe) { throw new RuntimeException( Unexpected IOException loading the sRGB profile: + ioe.getMessage()); ---