Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-23 Thread Chris Bowditch

Vincent Hennebert wrote:

Hi,


Hi Vincent,



Work on PDF accessibility is basically done. There are still some tests
to perform and maybe a few tweaks here and there, but the main
functionality is in place.


Thanks for all your hard work getting this feature debugged and cleaned up.



So I’d like to start a vote for merging the branch back to the Trunk:
https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility

The vote will last the usual 3 days but, since it’s a non-trivial new
feature, if any committer would like more time to review it, feel free
to say so and we can extend the vote to 1 week.

Attached is the diff between the branch and the Trunk, if this is of any
help.

+1 from me.


I've done some local testing with the branch just now and it seems to 
work, so +1 from me.


Chris



Thanks,
Vincent





Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-23 Thread Adrian Cumiskey

Hi Vincent,

I've take a look this morning and it looks good from the testing I've 
done - it really increases the PDF file size though! :).  +1 from me.


Adrian.

Chris Bowditch wrote:

Vincent Hennebert wrote:

Hi,


Hi Vincent,



Work on PDF accessibility is basically done. There are still some tests
to perform and maybe a few tweaks here and there, but the main
functionality is in place.


Thanks for all your hard work getting this feature debugged and 
cleaned up.




So I’d like to start a vote for merging the branch back to the Trunk:
https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility 



The vote will last the usual 3 days but, since it’s a non-trivial new
feature, if any committer would like more time to review it, feel free
to say so and we can extend the vote to 1 week.

Attached is the diff between the branch and the Trunk, if this is of any
help.

+1 from me.


I've done some local testing with the branch just now and it seems to 
work, so +1 from me.


Chris



Thanks,
Vincent








Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-23 Thread Vincent Hennebert
Hi,

Just a few precisions:

Jeremias Maerki wrote:
 On 22.10.2009 21:15:40 Simon Pepping wrote:
snip/
 Can you summarize what the branch tries to achieve?
 
 I'll try. In short: it provides the Tagged PDF feature that some people
 have always wanted.
 
 Long story: Without the accessibility/document structure feature, FOP
 simply produces pages with visual content. Visually impaired people need
 tools like a screen reader to read document to them. For that the reader
 needs to know which parts of a page are important and which are not, and
 in which order the elements should be read. It needs to know that a
 sentence continues on the next page without stumbling over the page
 footer in the middle of the sentence.

This is something that the branch doesn’t actually do yet... The
header/footer will be read at every new page, in the middle of the
sentence.
I don’t know yet how to fix that, and I’m not sure if that should be
done blindly anyway. It could be imagined that in some elaborate layouts
the side-regions have content that the author wants to be read aloud.


snip/
 There's another side-effect to tagged PDF: It allows for better text
 extraction from the document. PDF even describes ways to make
 round-trips from XML - PDF - XML - PDF if certain conditions were met.
 However, we don't do that.

Speaking of that, the current code doesn’t insert empty elements (like
fo:block/) into the structure tree. The corresponding StructElem
object /is/ created, but is not linked to its parent. Actually it’s
present in the PDF without being referred to by any other object.
I think this is inconsistent, and actually wrong since that would cause
a loss of information possibly needed by a round-trip transformation.
I’m going to change that.


snip/
 The vote will last the usual 3 days but, since it???s a non-trivial new
 feature, if any committer would like more time to review it, feel free
 to say so and we can extend the vote to 1 week.
 Can you make that 3 working days?
 
 Does that imply you don't work 7 days a week? ;-) Working days are what
 we usually apply here, don't we?

Errr... no. At least it’s just by chance if all the votes I’ve launched
so far turned out to last 3 working days. I usually just wait that most
active committers have voted. Speaking of working days doesn’t make much
sense to me anyway since not all committers work on FOP in their day
jobs. Some of them may actually be more active at week-ends.

All that said, I’m happy to make the vote last longer as Simon
requested. And to ensure that it lasts at least 3 working days from now
on.


Vincent


Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-23 Thread Jeremias Maerki
On 23.10.2009 13:14:36 Vincent Hennebert wrote:
 Hi,
 
 Just a few precisions:
 
 Jeremias Maerki wrote:
  On 22.10.2009 21:15:40 Simon Pepping wrote:
 snip/
  Can you summarize what the branch tries to achieve?
  
  I'll try. In short: it provides the Tagged PDF feature that some people
  have always wanted.
  
  Long story: Without the accessibility/document structure feature, FOP
  simply produces pages with visual content. Visually impaired people need
  tools like a screen reader to read document to them. For that the reader
  needs to know which parts of a page are important and which are not, and
  in which order the elements should be read. It needs to know that a
  sentence continues on the next page without stumbling over the page
  footer in the middle of the sentence.
 
 This is something that the branch doesn’t actually do yet... The
 header/footer will be read at every new page, in the middle of the
 sentence.
 I don’t know yet how to fix that, and I’m not sure if that should be
 done blindly anyway. It could be imagined that in some elaborate layouts
 the side-regions have content that the author wants to be read aloud.

Actually, I believe we already do it quite nicely but that there is a
bug in Acrobat's screen reader which doesn't fully rely on the document
structure information, but rather reads through the tag order on each
page which is not what I would expect.

I was just thinking: if PDFBox could be taught to interpret the document
structure information and feed the content to FreeTTS, you'd have a nice
open source PDF reader.

 snip/
  There's another side-effect to tagged PDF: It allows for better text
  extraction from the document. PDF even describes ways to make
  round-trips from XML - PDF - XML - PDF if certain conditions were met.
  However, we don't do that.
 
 Speaking of that, the current code doesn’t insert empty elements (like
 fo:block/) into the structure tree. The corresponding StructElem
 object /is/ created, but is not linked to its parent. Actually it’s
 present in the PDF without being referred to by any other object.
 I think this is inconsistent, and actually wrong since that would cause
 a loss of information possibly needed by a round-trip transformation.
 I’m going to change that.

Good catch.

snip/



Jeremias Maerki



Inserting Empty Elements Into the Structure Tree [was: Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk]

2009-10-23 Thread Vincent Hennebert
Vincent Hennebert wrote:
 Hi,
 
snip/
 There's another side-effect to tagged PDF: It allows for better text
 extraction from the document. PDF even describes ways to make
 round-trips from XML - PDF - XML - PDF if certain conditions were met.
 However, we don't do that.
 
 Speaking of that, the current code doesn’t insert empty elements (like
 fo:block/) into the structure tree. The corresponding StructElem
 object /is/ created, but is not linked to its parent. Actually it’s
 present in the PDF without being referred to by any other object.
 I think this is inconsistent, and actually wrong since that would cause
 a loss of information possibly needed by a round-trip transformation.
 I’m going to change that.

I mean, /at some point/ I’m going to change that...

This is not as easily done as it is said. Take the following example:
fo:block
  Before the empty block.
  fo:block/
  After the empty block.
/fo:block

What basically happens currently is that two text drawing requests are
made to the PDF renderer. The renderer creates the appropriate PDF
stream and registers the pieces of text as children of the structure
element corresponding to the outer block. But nothing happens regarding
the inner empty block, since obviously there’s nothing to do.

The structure element for the inner empty block can’t be added to the
outer block’s children at creation time, otherwise the logical order
wouldn’t be followed.

From the quick look I had this is a fundamental limitation of the
current approach. There’s no way to know at which place an empty element
must be inserted into the children list of its parent.

The only way to solve this issue probably is to integrate the handling
of the logical structure into the whole processing chain, passing the
suitable information from the FO tree to the layout engine to the area
tree to the renderer. Probably something that should have been done from
the beginning but this is all but trivial.

Vincent


Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-23 Thread Simon Pepping
On Thu, Oct 22, 2009 at 10:36:47PM +0200, Jeremias Maerki wrote:
 On 22.10.2009 21:15:40 Simon Pepping wrote:
  On Thu, Oct 22, 2009 at 05:12:00PM +0100, Vincent Hennebert wrote:
   Hi,
   
   Work on PDF accessibility is basically done. There are still some tests
   to perform and maybe a few tweaks here and there, but the main
   functionality is in place.
   
   So I???d like to start a vote for merging the branch back to the Trunk:
   https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility
  
  Can you summarize what the branch tries to achieve?
 
 I'll try. In short: it provides the Tagged PDF feature that some people
 have always wanted.

Thanks. That was quite clear. I am not in a position to judge the
quality of the implementation. I welcome this addition to FOP.

I vote +1 to the merger of the Temp_Accessibility branch into trunk.

Simon 

-- 
Simon Pepping
home page: http://www.leverkruid.eu


Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-22 Thread Jeremias Maerki
I followed what you did in the branch and I like it, although I can't
really follow why accessibility needed to be backported to the old
PDFRenderer. So entirely +1 from me. And thanks for diving into this.
It's good to know that this knowledge has a broader foundation in the
project. And of course, FOP gets exciting new functionality.

On 22.10.2009 18:12:00 Vincent Hennebert wrote:
 Hi,
 
 Work on PDF accessibility is basically done. There are still some tests
 to perform and maybe a few tweaks here and there, but the main
 functionality is in place.
 
 So I’d like to start a vote for merging the branch back to the Trunk:
 https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility
 
 The vote will last the usual 3 days but, since it’s a non-trivial new
 feature, if any committer would like more time to review it, feel free
 to say so and we can extend the vote to 1 week.
 
 Attached is the diff between the branch and the Trunk, if this is of any
 help.
 
 +1 from me.
 
 Thanks,
 Vincent




Jeremias Maerki



Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-22 Thread Simon Pepping
On Thu, Oct 22, 2009 at 05:12:00PM +0100, Vincent Hennebert wrote:
 Hi,
 
 Work on PDF accessibility is basically done. There are still some tests
 to perform and maybe a few tweaks here and there, but the main
 functionality is in place.
 
 So I???d like to start a vote for merging the branch back to the Trunk:
 https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility

Can you summarize what the branch tries to achieve?

 The vote will last the usual 3 days but, since it???s a non-trivial new
 feature, if any committer would like more time to review it, feel free
 to say so and we can extend the vote to 1 week.

Can you make that 3 working days?

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu


Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-22 Thread Jeremias Maerki
On 22.10.2009 21:15:40 Simon Pepping wrote:
 On Thu, Oct 22, 2009 at 05:12:00PM +0100, Vincent Hennebert wrote:
  Hi,
  
  Work on PDF accessibility is basically done. There are still some tests
  to perform and maybe a few tweaks here and there, but the main
  functionality is in place.
  
  So I???d like to start a vote for merging the branch back to the Trunk:
  https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility
 
 Can you summarize what the branch tries to achieve?

I'll try. In short: it provides the Tagged PDF feature that some people
have always wanted.

Long story: Without the accessibility/document structure feature, FOP
simply produces pages with visual content. Visually impaired people need
tools like a screen reader to read document to them. For that the reader
needs to know which parts of a page are important and which are not, and
in which order the elements should be read. It needs to know that a
sentence continues on the next page without stumbling over the page
footer in the middle of the sentence. An image tells a blind person
nothing, so it needs a descriptive text that the screen reader will read
aloud in place of the image. There are laws in various countries that
require certain organizations to produce barrier-free documents. The
accessibility branches' main purpose is just to help with these
requirements.

PDF is the only format we support that has such features. If FOP
implemented PDFXML or XPS, we could later support accessibility there,
too, based on the work started here.

There's another side-effect to tagged PDF: It allows for better text
extraction from the document. PDF even describes ways to make
round-trips from XML - PDF - XML - PDF if certain conditions were met.
However, we don't do that.

Finally, with tagged PDF it is possible to create PDF/A-1a conformant
documents in addition to the PDF/A-1b that we already support. That's
important for long-term archival of documents.

  The vote will last the usual 3 days but, since it???s a non-trivial new
  feature, if any committer would like more time to review it, feel free
  to say so and we can extend the vote to 1 week.
 
 Can you make that 3 working days?

Does that imply you don't work 7 days a week? ;-) Working days are what
we usually apply here, don't we?

 Simon
 
 -- 
 Simon Pepping
 home page: http://www.leverkruid.eu




Jeremias Maerki



Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk

2009-10-22 Thread The Web Maestro
Sounds like a lofty and honorable goal.

+1 from me.

Clay
-- 
the.webmaes...@gmail.com - http://ourlil.com/
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet