Hi,
The -bbox option is great, but there are two problems that I've noticed.
First, the <body> and <html> tags aren't closed, which cause a problem
for parsing the xml:
--- a/utils/pdftotext.cc
+++ b/utils/pdftotext.cc
@@ -361,6 +361,8 @@ int main(int argc, char *argv[]) {
}
fprintf(f, "</doc>\n");
}
+ fprintf(f, "</body>\n");
+ fprintf(f, "</html>\n");
fclose(f);
delete textOut;
} else {
second, though the program outputs the data fine, I get a segmentation
fault. I'm not a C programmer so I'm not sure how to debug this
tom@tom:~/Desktop$ pdftotext -bbox thrift-20070401.pdf
Segmentation fault
Hope that helps.
--
Tom Gleason, PHP Developer
Exploring ResourceSpace at:
http://resourcespace.blogspot.com
ResourceSpace Support Services
https://www.buildadam.com/muse2
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler