See answers inline: - the error seems to be too general, essentially it always raises JAVA-EXCEPTION no matter what goes wrong (e.g. it the given input is not a valid pdf) I adapted the error msg to be more clear/specific.
- the java stack trace seems to be sent to standard error Goes to std err. - Renders the each page of the PDF document as an image. => Renders each page of the PDF document as an image. Done. - the names of the private functions should also adhere to the code conventions renderToImages => render-to-images Done. - make xqdoc failes because the comments seem to contain invalid xml </home/mbrantner/zorba/build/URI_PATH/com/zorba-xquery/www/modules/project_xqdoc.xq>:142,9: user-defined error [err:UE004]: Error processing module zerr:ZXQD0002 - " This module provides funtionality to read the text from PDF documents and to render PDF documents to images. <a href="http://pdfbox.apache.org">Apache PDFBox</a> library is used to implement these functions. <br /> <br /> <b>Note:</b> Since this module has a Java library dependency a JVM required to be installed on the system. For Windows: jvm.dll is required on the system path ( usually located in "C:\Program Files\Java\jre6\bin\client". <b>Note:<b> For Debian based Linux distributions install PdfBox and FontBox packages: sudo apt-get install libpdfbox-java libfontbox-java ": can not parse as XML for xqdoc: loader parsing error: Opening and ending tag mismatch: b line 0 and root ; raised at /home/mbrantner/zorba/sandbox/src/runtime/errors_and_diagnostics/errors_and_diagnostics_impl.cpp:81 Done. - adapt the year in "Copyright 2006-2009 The FLWOR Foundation." in the .xq file (and some other files also) Done. - would it make sense to return one string per page in the pdf instead of one big string? The API doesn't alow it, but I added two more optional options, to insert a user defined string at the start and end of each page. - remove commented out code in read-pdf.cpp Done. - valgrind shows tons of invalid writes. Why? Are they critical? Is there anything we can do? Jvm always shows in valgrind, even if nothing is done with it. I was careful to remove any allocated memory. - would it make sense to return the images in a streaming fashion (i.e. don't create all base64's in a vector)? No, because it's a push write of all images. And as discussed, optimize only a copy in some cases isn't worth the effort. - encoding each image shouldn't be necessary and will probably we wasted effort because the images might be written to a file in their binary form Done. -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125338 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : firstname.lastname@example.org Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp