Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
Vincent Hennebert wrote: Hi, Hi Vincent, Work on PDF accessibility is basically done. There are still some tests to perform and maybe a few tweaks here and there, but the main functionality is in place. Thanks for all your hard work getting this feature debugged and cleaned up. So I’d like to start a vote for merging the branch back to the Trunk: https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility The vote will last the usual 3 days but, since it’s a non-trivial new feature, if any committer would like more time to review it, feel free to say so and we can extend the vote to 1 week. Attached is the diff between the branch and the Trunk, if this is of any help. +1 from me. I've done some local testing with the branch just now and it seems to work, so +1 from me. Chris Thanks, Vincent
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
Hi Vincent, I've take a look this morning and it looks good from the testing I've done - it really increases the PDF file size though! :). +1 from me. Adrian. Chris Bowditch wrote: Vincent Hennebert wrote: Hi, Hi Vincent, Work on PDF accessibility is basically done. There are still some tests to perform and maybe a few tweaks here and there, but the main functionality is in place. Thanks for all your hard work getting this feature debugged and cleaned up. So I’d like to start a vote for merging the branch back to the Trunk: https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility The vote will last the usual 3 days but, since it’s a non-trivial new feature, if any committer would like more time to review it, feel free to say so and we can extend the vote to 1 week. Attached is the diff between the branch and the Trunk, if this is of any help. +1 from me. I've done some local testing with the branch just now and it seems to work, so +1 from me. Chris Thanks, Vincent
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
Hi, Just a few precisions: Jeremias Maerki wrote: On 22.10.2009 21:15:40 Simon Pepping wrote: snip/ Can you summarize what the branch tries to achieve? I'll try. In short: it provides the Tagged PDF feature that some people have always wanted. Long story: Without the accessibility/document structure feature, FOP simply produces pages with visual content. Visually impaired people need tools like a screen reader to read document to them. For that the reader needs to know which parts of a page are important and which are not, and in which order the elements should be read. It needs to know that a sentence continues on the next page without stumbling over the page footer in the middle of the sentence. This is something that the branch doesn’t actually do yet... The header/footer will be read at every new page, in the middle of the sentence. I don’t know yet how to fix that, and I’m not sure if that should be done blindly anyway. It could be imagined that in some elaborate layouts the side-regions have content that the author wants to be read aloud. snip/ There's another side-effect to tagged PDF: It allows for better text extraction from the document. PDF even describes ways to make round-trips from XML - PDF - XML - PDF if certain conditions were met. However, we don't do that. Speaking of that, the current code doesn’t insert empty elements (like fo:block/) into the structure tree. The corresponding StructElem object /is/ created, but is not linked to its parent. Actually it’s present in the PDF without being referred to by any other object. I think this is inconsistent, and actually wrong since that would cause a loss of information possibly needed by a round-trip transformation. I’m going to change that. snip/ The vote will last the usual 3 days but, since it???s a non-trivial new feature, if any committer would like more time to review it, feel free to say so and we can extend the vote to 1 week. Can you make that 3 working days? Does that imply you don't work 7 days a week? ;-) Working days are what we usually apply here, don't we? Errr... no. At least it’s just by chance if all the votes I’ve launched so far turned out to last 3 working days. I usually just wait that most active committers have voted. Speaking of working days doesn’t make much sense to me anyway since not all committers work on FOP in their day jobs. Some of them may actually be more active at week-ends. All that said, I’m happy to make the vote last longer as Simon requested. And to ensure that it lasts at least 3 working days from now on. Vincent
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
On 23.10.2009 13:14:36 Vincent Hennebert wrote: Hi, Just a few precisions: Jeremias Maerki wrote: On 22.10.2009 21:15:40 Simon Pepping wrote: snip/ Can you summarize what the branch tries to achieve? I'll try. In short: it provides the Tagged PDF feature that some people have always wanted. Long story: Without the accessibility/document structure feature, FOP simply produces pages with visual content. Visually impaired people need tools like a screen reader to read document to them. For that the reader needs to know which parts of a page are important and which are not, and in which order the elements should be read. It needs to know that a sentence continues on the next page without stumbling over the page footer in the middle of the sentence. This is something that the branch doesn’t actually do yet... The header/footer will be read at every new page, in the middle of the sentence. I don’t know yet how to fix that, and I’m not sure if that should be done blindly anyway. It could be imagined that in some elaborate layouts the side-regions have content that the author wants to be read aloud. Actually, I believe we already do it quite nicely but that there is a bug in Acrobat's screen reader which doesn't fully rely on the document structure information, but rather reads through the tag order on each page which is not what I would expect. I was just thinking: if PDFBox could be taught to interpret the document structure information and feed the content to FreeTTS, you'd have a nice open source PDF reader. snip/ There's another side-effect to tagged PDF: It allows for better text extraction from the document. PDF even describes ways to make round-trips from XML - PDF - XML - PDF if certain conditions were met. However, we don't do that. Speaking of that, the current code doesn’t insert empty elements (like fo:block/) into the structure tree. The corresponding StructElem object /is/ created, but is not linked to its parent. Actually it’s present in the PDF without being referred to by any other object. I think this is inconsistent, and actually wrong since that would cause a loss of information possibly needed by a round-trip transformation. I’m going to change that. Good catch. snip/ Jeremias Maerki
Inserting Empty Elements Into the Structure Tree [was: Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk]
Vincent Hennebert wrote: Hi, snip/ There's another side-effect to tagged PDF: It allows for better text extraction from the document. PDF even describes ways to make round-trips from XML - PDF - XML - PDF if certain conditions were met. However, we don't do that. Speaking of that, the current code doesn’t insert empty elements (like fo:block/) into the structure tree. The corresponding StructElem object /is/ created, but is not linked to its parent. Actually it’s present in the PDF without being referred to by any other object. I think this is inconsistent, and actually wrong since that would cause a loss of information possibly needed by a round-trip transformation. I’m going to change that. I mean, /at some point/ I’m going to change that... This is not as easily done as it is said. Take the following example: fo:block Before the empty block. fo:block/ After the empty block. /fo:block What basically happens currently is that two text drawing requests are made to the PDF renderer. The renderer creates the appropriate PDF stream and registers the pieces of text as children of the structure element corresponding to the outer block. But nothing happens regarding the inner empty block, since obviously there’s nothing to do. The structure element for the inner empty block can’t be added to the outer block’s children at creation time, otherwise the logical order wouldn’t be followed. From the quick look I had this is a fundamental limitation of the current approach. There’s no way to know at which place an empty element must be inserted into the children list of its parent. The only way to solve this issue probably is to integrate the handling of the logical structure into the whole processing chain, passing the suitable information from the FO tree to the layout engine to the area tree to the renderer. Probably something that should have been done from the beginning but this is all but trivial. Vincent
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
On Thu, Oct 22, 2009 at 10:36:47PM +0200, Jeremias Maerki wrote: On 22.10.2009 21:15:40 Simon Pepping wrote: On Thu, Oct 22, 2009 at 05:12:00PM +0100, Vincent Hennebert wrote: Hi, Work on PDF accessibility is basically done. There are still some tests to perform and maybe a few tweaks here and there, but the main functionality is in place. So I???d like to start a vote for merging the branch back to the Trunk: https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility Can you summarize what the branch tries to achieve? I'll try. In short: it provides the Tagged PDF feature that some people have always wanted. Thanks. That was quite clear. I am not in a position to judge the quality of the implementation. I welcome this addition to FOP. I vote +1 to the merger of the Temp_Accessibility branch into trunk. Simon -- Simon Pepping home page: http://www.leverkruid.eu
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
I followed what you did in the branch and I like it, although I can't really follow why accessibility needed to be backported to the old PDFRenderer. So entirely +1 from me. And thanks for diving into this. It's good to know that this knowledge has a broader foundation in the project. And of course, FOP gets exciting new functionality. On 22.10.2009 18:12:00 Vincent Hennebert wrote: Hi, Work on PDF accessibility is basically done. There are still some tests to perform and maybe a few tweaks here and there, but the main functionality is in place. So Iâd like to start a vote for merging the branch back to the Trunk: https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility The vote will last the usual 3 days but, since itâs a non-trivial new feature, if any committer would like more time to review it, feel free to say so and we can extend the vote to 1 week. Attached is the diff between the branch and the Trunk, if this is of any help. +1 from me. Thanks, Vincent Jeremias Maerki
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
On Thu, Oct 22, 2009 at 05:12:00PM +0100, Vincent Hennebert wrote: Hi, Work on PDF accessibility is basically done. There are still some tests to perform and maybe a few tweaks here and there, but the main functionality is in place. So I???d like to start a vote for merging the branch back to the Trunk: https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility Can you summarize what the branch tries to achieve? The vote will last the usual 3 days but, since it???s a non-trivial new feature, if any committer would like more time to review it, feel free to say so and we can extend the vote to 1 week. Can you make that 3 working days? Simon -- Simon Pepping home page: http://www.leverkruid.eu
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
On 22.10.2009 21:15:40 Simon Pepping wrote: On Thu, Oct 22, 2009 at 05:12:00PM +0100, Vincent Hennebert wrote: Hi, Work on PDF accessibility is basically done. There are still some tests to perform and maybe a few tweaks here and there, but the main functionality is in place. So I???d like to start a vote for merging the branch back to the Trunk: https://svn.eu.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility Can you summarize what the branch tries to achieve? I'll try. In short: it provides the Tagged PDF feature that some people have always wanted. Long story: Without the accessibility/document structure feature, FOP simply produces pages with visual content. Visually impaired people need tools like a screen reader to read document to them. For that the reader needs to know which parts of a page are important and which are not, and in which order the elements should be read. It needs to know that a sentence continues on the next page without stumbling over the page footer in the middle of the sentence. An image tells a blind person nothing, so it needs a descriptive text that the screen reader will read aloud in place of the image. There are laws in various countries that require certain organizations to produce barrier-free documents. The accessibility branches' main purpose is just to help with these requirements. PDF is the only format we support that has such features. If FOP implemented PDFXML or XPS, we could later support accessibility there, too, based on the work started here. There's another side-effect to tagged PDF: It allows for better text extraction from the document. PDF even describes ways to make round-trips from XML - PDF - XML - PDF if certain conditions were met. However, we don't do that. Finally, with tagged PDF it is possible to create PDF/A-1a conformant documents in addition to the PDF/A-1b that we already support. That's important for long-term archival of documents. The vote will last the usual 3 days but, since it???s a non-trivial new feature, if any committer would like more time to review it, feel free to say so and we can extend the vote to 1 week. Can you make that 3 working days? Does that imply you don't work 7 days a week? ;-) Working days are what we usually apply here, don't we? Simon -- Simon Pepping home page: http://www.leverkruid.eu Jeremias Maerki
Re: [VOTE] Merge the Temp_Accessibility Branch Back to Trunk
Sounds like a lofty and honorable goal. +1 from me. Clay -- the.webmaes...@gmail.com - http://ourlil.com/ My religion is simple. My religion is kindness. - HH The 14th Dalai Lama of Tibet