On Tue, 19 May 2009 11:04:49 -0700, Manuel Fiorelli <[email protected]> wrote:
I'm happy to see I am not the only who feels  this feature to be
useful. I saw that in your model, every node is an annotation, which
is fine to easily implement the property "textContent", which returns
the text contained in an Element.

Also the support for pdf (and other document formats) would be an
important addition...

Manuel Fiorelli

For PDF filtering, check out this open-source project: http://aperture.sourceforge.net

This handles PDF, HTML, XML, RTF, Office, OpenOffice, Corel, email, ical. It also provides crawlers. It's built on other open-source libraries, such as POI and PDFBox, but adds the ability to produce XML with RDF elements. The RDF could be represented in the document model I proposed.

Greg

Reply via email to