DO NOT REPLY [Bug 40271] [PATCH] auto table layout -- dirty draft

2010-01-14 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=40271

carsten.pfeif...@gebit.de changed:

   What|Removed |Added

 CC||carsten.pfeif...@gebit.de

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


Re: FOP and large documents: out of memory

2010-01-14 Thread Vincent Hennebert
Hi Stephan,

I’m not sure I would invest any energy into improving the
CachedRenderPagesModel (-conserve option). It doesn’t look like the
right approach to me, and like you noticed it doesn’t even work out of
the box currently.

Why store the Area Tree on disk? Why not directly render it into the
final output format? If that latter supports out-of-order pages, then
that’s great; Otherwise we may as well store the final pages and order
them later on when the document is complete, instead of storing them in
a half-finished area tree format.

As to pages that hold unresolved references, so can’t obviously be
rendered yet: there usually aren’t that many of them that would make the
area tree solution vastly superior to a final format one in term of
memory consumption. Those ones could be kept in memory until all the
references they hold are resolved.

Also, the handling of forward references is currently less than optimal.
The resolution is made in the area tree instead of looping back to the
layout engine. ATM, a page-reference is rendered using a placeholder
string (‘MMM’), and that placeholder is later replaced with the actual
value (e.g., ‘5’). This is fine for constructs like tables of content,
but may produce ugly results if the page-number-citation is inside
a paragraph, ruining the even spacing. What’s the point of implementing
a high-quality line-breaking algorithm if its output is spoiled by
a poor handling of page citations?

I think the two-pass approach is the best long-term solution, although
obviously less trivial. One challenge is to detect a possible infinite
loop. For example: referenced item is at the beginning of page IX,
reference is updated to IX, which takes less room than MMM, so the
document is re-laid out and referenced item is moved to page VIII;
Reference must be updated again, document is laid out again and
referenced item end up on page IX again. And again, and again...


One possible workaround for your use case is to generate your document
once with a dummy TOC and just “Page X” into the intermediate format;
Parse it to get the total number of pages and the page numbers for each
element of the TOC; Re-generate it with hardcoded values for page
references.

HTH,
Vincent


Stephan Thesing wrote:
 Hello,
 
 as is well-known, FOP can run out of heap memory, when large documents
 are processed (http://xmlgraphics.apache.org/fop/0.95/running.html#memory).
 
 I have the situation that the documents I have to process mandate a footer on 
 each page that contains a page X of Y element and a TOC at the
 beginning of the document, i.e. FOP cannot layout the pages until all
 referenced page-citations are known, which is after the last page of the 
 document.
 
 When page content is quite complicated (e.g. 2000 pages mostly full with 
 tables), the heap space does not suffice to hold all pages until all 
 references can be resolved, thus FOP aborts with out-of-memory.
 
 Since increasing the heap space does not always work (3 GB heap space was 
 required in one example), I need a better solution for this.
 
 1. -conserve option
 One alternative would be the -conserve option, which serializes the pages 
 to disk and reloads them as needed.
 Although slow, this definitely would be a solution, if it worked, which it 
 doesn't:
  Our documents include graphics (SVG, PNG), and the serialization with 
 -conserve throws an exception, because some class in Batik is not 
 serializable (e.g. SVGOMAnimatedString IIRR), thus the page is missing, 
 causing FOP to abort later.
 Thus, Batik would have to be fixed for this.
 
 2. Two passes
 Since the pages are kept because of unresolved references, one could do the
 same as e.g. LaTeX always did: process the document twice.
 In a first run, pages are discarded after layout, only the references for 
 page-citations are kept and at the end reused for the second pass
 (when all pages for the citations are finally known).
 For the second run, these id-refs are initially loaded and no pages have
 to be kept.
 This would require more changes in FOP (and should definitely be made 
 optional obviously).
 
 
 
 I would appreciate any comments or other suggestions !
 
 
 Best regards
   Stephan


Re: change bar status

2010-01-14 Thread Chris Bowditch

Stephan Thesing wrote:

Dear all,

sorry for the long delay in answering.


Hi Stephan,

thanks for looking into this! Change bars would be a useful feature 
addition to FOP.




Having had some time in the last days to actually continue working
on the change bar stuff:

 + parsing and validating the fo:change-bar-begin and fo:change-bar-end 
 elements works (including properties)

 + attaching to all fo: elements that are between fo:change-bar-begin
  and fo:change-bar-end elements the vector of change bars that
  affect the element is implemented

If I understand the FO standard right, then for each area generated
by a FO object under the influence of a change bar, an area (with 
correct border-style/width/color... settings) has to be created as 
an xsl-absolute area and placed as determined by the change bar style

relative to the column or page edge.
For areas generated in the body-region, these additional areas are
children of the flow area, for after/before/start/end they are children
of the respective area region.

If I understand the FOP code right, the areas are actually constructed by
the various layout managers in addAreas().


That is my understanding too.



Now, when creating the change-bar areas, one could for every area thus 
constructed, add a change bar area at the correct position in the area tree.
But, it is desireable to merge change bar areas, because many areas
will simply be the same or be adjacent to each other down the page side
and should thus be represented as a single area representing the union
of these areas (as far as possible).
Another point is that the change-bar areas have a z-Buffer trait that
decides visibility of overlapping change bars: I am not sure how to handle that 
yet.


Desirable yes, but I'm not sure it is essential. I would try to get it 
working without merging first and then evaluate if the solution is 
acceptable without merging and if so tackle merging as a later project. 
We can certainly help with the evaluation if you post into a bugzilla entry.




Would it be best to really add the change bar areas when the originating areas 
are constructed and merge on the fly or would it be better to
search for all change bar generating areas when the page is finished and then 
to perform the merge?


Currently missing altogether are unit tests for the code I have added
or changed.

In short, we are getting somewhere.


Thanks for the update. It is encouraging.

Chris


Re: svn commit: r898845 - /xmlgraphics/fop/trunk/test/accessibility/pdf/

2010-01-14 Thread Vincent Hennebert
Hi Simon,

Simon Pepping wrote:
 On Wed, Jan 13, 2010 at 05:17:03PM -, vhenneb...@apache.org wrote:
 Author: vhennebert
 Date: Wed Jan 13 17:17:01 2010
 New Revision: 898845

 URL: http://svn.apache.org/viewvc?rev=898845view=rev
 Log:
 Updated reference accessible PDF files. Old ones had Apache FOP Version SVN 
 branches/Temp_Accessibility as Creator and Producer values. New ones have 
 Apache FOP Version SVN trunk. This was causing spurious differences when 
 testing PDF accessibility.
  
 Where is the version property set?

In FOUserAgent, producer field is set to Apache FOP Version
 + Version.getVersion() and returned by the getProducer() method; That
method is called, among others, in PDFRenderingUtil.setupPDFDocument.
Something similar is done (I suppose, haven’t checked) for Creator.

Is that what you asked for?

Vincent


Re: FOP and large documents: out of memory

2010-01-14 Thread Andreas Delmelle
On 13 Jan 2010, at 22:37, Stephan Thesing wrote:

 
 On 13 Jan 2010, at 21:27, Stephan Thesing wrote:
 ...
 Our documents include graphics (SVG, PNG), and the serialization with
 -conserve throws an exception, because some class in Batik is not
 serializable (e.g. SVGOMAnimatedString IIRR), thus the page is missing, 
 causing
 FOP to abort later.
 Thus, Batik would have to be fixed for this.
 
 I think FOP can be 'fixed' for this too. If that is really the only class
 that is causing trouble, then FOP could make a serializable subclass for
 it, and use that in the area tree, instead of Batik's default
 non-serializable implementation. Unless Batik really needs it, why fix it 
 there?
 
 I don't think that can work, as that class is used in elements nested in 
 classes of Batik that represent the SVG.
 
 I.e., FOP never instantiates it, but the Batik code does somewhere along

OK, I see...

Just noticed that my idea for 'subclassing' is probably not entirely what I 
meant...
Suppose, for the sake of the argument, that String is not serializable, but 
we'd need it for some reason and the Java vendor does not want to alter their 
implementation. What could be done, is store only the info needed to create a 
new String upon deserialization. Serialize the char-array, and re-instantiate 
the String instead.

I was thinking something similar should be possible here, but if it is really 
that far out of FOP's control, then never mind.


Regards

Andreas

Andreas Delmelle
mailto:andreas.delmelle.AT.telenet.be
---



Re: ConcurrentModificationException error

2010-01-14 Thread Andreas Delmelle
On 10 Dec 2009, at 03:24, Anil Pinto wrote:

Hi Anil

(Didn't see a response for this one come in, so far on fop-us...@... Apologies 
if the reply comes a bit late.)

 We have FOP (0.95) embedded in a multithreaded environment to create many 
 PDFs almost simultaneously.
  
 We hav been using this configuration for 6 months plus now. I noticed the 
 following trace for the first time and it caught my attention, as I thought 
 we have followed all the multithreaded requirements required by FOP.

It is pointing to a bug in FOP, due to a slight oversight in making use of 
java.awt.ICC_Profile, IIC.

  
 java.util.ConcurrentModificationException
  at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
  at java.util.AbstractList$Itr.next(AbstractList.java:343)
  at 
 sun.awt.color.ProfileDeferralMgr.activateProfiles(ProfileDeferralMgr.java:75)
  at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:756)
  at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:996)

Checking the Javadocs, there is no mention anywhere of the multi-thread 
(un)safety of ICC_Profile or the call to getInstance(). So, I think we can only 
safely assume that this means it is unsafe.

  at 
 org.apache.fop.pdf.PDFICCBasedColorSpace.setupsRGBColorProfile(PDFICCBasedColorSpace.java:140)

Seen that it is a static method calling another static method, the chances of 
anything bad happening are very slim, but so you stumbled upon it. :(
Seems a perfect example of a race condition, though: you mean this is the first 
time in all those 6 months that this error occurred? Very slim indeed, then!

As for the good news (I hope I am correct about this):
FOP can solve this easily, either by making setupsRGBProfile() a synchronized 
method, or by performing only the call to ICC_Profile.getInstance() in a 
synchronized block. My preference goes in the direction of the latter, as that 
limits the synchronization overhead to the single call into the AWT library, 
which is causing the issue. The rest of the method appears safe for concurrent 
runs, at first glance. 
The (minor) downside is that we would have to introduce a new static final to 
synchronize the calls on. 

Very quick patch below (vs current trunk; don't know if it can be applied to 
0.95 without small changes...).


HTH!

Regards,

Andreas

---
Index: src/java/org/apache/fop/pdf/PDFICCBasedColorSpace.java
===
--- src/java/org/apache/fop/pdf/PDFICCBasedColorSpace.java  (revision 
679326)
+++ src/java/org/apache/fop/pdf/PDFICCBasedColorSpace.java  Wed Jan 13 
20:29:07 CET 2010
@@ -34,6 +34,8 @@
 private PDFICCStream iccStream;
 private String explicitName;
 
+private static final Object _S = new Object();
+
 /**
  * Constructs a the ICCBased color space with an explicit name (ex. 
DefaultRGB).
  * @param explicitName an explicit name or null if a name should be 
generated
@@ -137,7 +139,9 @@
 InputStream in = PDFDocument.class.getResourceAsStream(sRGB Color 
Space Profile.icm);
 if (in != null) {
 try {
+synchronized (_S) {
-profile = ICC_Profile.getInstance(in);
+profile = ICC_Profile.getInstance(in);
+}
 } catch (IOException ioe) {
 throw new RuntimeException(
 Unexpected IOException loading the sRGB profile:  + 
ioe.getMessage());
---