Hi, I'm a researcher at INESC-ID and I'm currently working on an application that intends to parse ISO standards (stored in PDF files) and store their text into a database. This implies building some sort of tree with all the sections and subsections and so on...
Well I'm aware that PDF files don't reflect text structure so I was aiming for a different approach. Just being able to have the text split into paragraphs would aready be a massive help. An amazing help would be to have a way to differ between text styles so as to sort normal text from headings and all that. Well I've managed to extract plain text with your API. And with a lot of effot it would be possible to organize that plain text and provide it with some structure. However, I was wondering if your API does not provide an easier way to do this. Maybe using some sort of object iteration within a page? Thanks for the help. Best regards, *João M. F. Cardoso* MSc in Telecommunications and Informatics Engineering, INESC-ID m:(+351) 916190940 | e:[email protected] | a: Skype: joao.m.f.cardoso Get a signature like this: <http://ws-stats.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS8/dXRtX3NvdXJjZT1leHRlbnNpb24mdXRtX21lZGl1bT1lbWFpbCZ1dG1fY2FtcGFpZ249cHJvbW9fNDUiLCAiZSI6ICJwcm9tb180NV9jbGljayJ9> Click here! <http://ws-stats.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS8/dXRtX3NvdXJjZT1leHRlbnNpb24mdXRtX21lZGl1bT1lbWFpbCZ1dG1fY2FtcGFpZ249cHJvbW9fNDUiLCAiZSI6ICJwcm9tb180NV9jbGljayJ9>

