Re-architecturing Tagged PDF

2011-09-06 Thread Vincent Hennebert
Hi All,

As can be seen in Bugzilla #50852 [1], the current implementation of
tagged PDF has fundamental limitations that prevent some features from
working. Among others, an empty table cell will not show up in the
structure tree, such that a screen reader will effectively shift other
cells by one column.

Also, building a structure tree using a preliminary XSLT does not scale
well, and is not even necessary since the structure tree is almost
readily available in form of the FO tree.

Of course it’s possible to warp the current code into something that
more or less does the job, but besides not solving the performance issue
that would make it too difficult to understand and maintain.

I’d like to work with Peter on re-architecturing the tagged PDF code.
The broad idea is to integrate the construction of the structure tree
into the current processing pipeline (XSL-FO - FO tree - Layout
Managers - Area Tree - Rendering etc.).

We will work on a temporary branch forked off Trunk so that interested
parties can follow our progress. Once the work is done we will call for
a vote to merge it back to Trunk.

Any comments or suggestions are welcome.
Thanks,
Vincent


[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=50852
See also comment #21
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852#c21
And mailing list: http://markmail.org/message/mn7jdbxmjdq7ey52


RE: How to translate characters?

2011-09-06 Thread Eric Douglas

Aha!
Your code was almost right.
Now, I can't put a square into the XML or I get this error.
SystemId Unknown; Line #188; Column #5; An invalid XML character
(Unicode: 0x0) was found in the element content of the document.

So I stick with putting the string value #x25a1; into my text object
I'm using to create various outputs.
For a PDF I get the text value, write it to XML, run it through the
transform to get FO, run it through FOP to get a PDF, then load it into
pdfbox to print.
The XML file contains that string.  The FO file contains the square
character.  I'm guessing the Transformer itself knows nothing about this
but something in the SAX handler converts it.
For a print preview just to display a single unicode value the simplest
way is to hardcode it.
Your String declaration was on the right track except you need a value
container.

drawText = displayText.getText();
String CHARCHECKBOX = new String(#x25a1;);
String CHARSMILEY = new String(#x263A;);
String UNICODECHARCHECKBOX = new String(new int[]{0x25a1}, 0, 1 );
String UNICODECHARSMILEY = new String(new int[]{0x263A}, 0, 1 );
drawText = drawText.replaceAll(CHARCHECKBOX, UNICODECHARCHECKBOX);
drawText = drawText.replaceAll(CHARSMILEY, UNICODECHARSMILEY);

I don't know an easy way to find and replace all possible such unicode
strings aside from creating a custom SAX handler and wrapping the text
in some default XML to recreate what the Transformer is doing.
 

-Original Message-
From: Christopher R. Maden [mailto:cr...@maden.org] 
Sent: Monday, September 05, 2011 9:32 PM
To: fop-dev@xmlgraphics.apache.org
Subject: Re: How to translate characters?

On 09/05/2011 06:31 PM, Eric Douglas wrote:
 So, I'm confused.  I put that exact string into an fo:inline.
 I'm running embedded code and I separated the XML-FO transform step
from the FOP step.
 I'm just running a straight Oracle transform.
 My input is an XML file containing the string #x25A1; My transformer 
 is created using an XSL file.
 My output is an FO file with that string converted to a square.
 I put that string into a Java String and get the value.  It comes out
the same as it went in.
 How do I get the value of the String as a square?

Java is not XML.  Java has no native facility for interpreting an XML
numeric character reference, so you are getting exactly the expected
results.

I don't speak Java well, but try something like this:

String square = new String( [ 0x25a1 ], 0, 1 );

Then pass that string, square, to FOP.

If that doesn't help, I'm done; I can read Java, but don't really write
it well.

~Chris
--
Chris Maden, text nerd  URL: http://crism.maden.org/  The present
tendency and drift towards the Police State gives all  free Americans
pause. - Alabama Supreme Court, 1955  (Pike v. Southern Bell Tel. 
Telegraph, 81 So.2d 254)