On Mon, 5 Dec 2011, Paul Pearcy wrote:
It appears that under the hood pdfbox can work with either a RandomAccessFile (http://pdfbox.apache.org/apidocs/org/apache/pdfbox/io/RandomAccessFile.html ) or a RandomAccessBuffer (http://pdfbox.apache.org/apidocs/org/apache/pdfbox/io/RandomAccessBuffer.html) and that tika uses RandomAccessBuffers for better performance. I'd like to sacrifice this performance for less RAM usage.

Is this possible?

I think it should be a fairly simple change, to test if we have a TikaInputStream, and if so one with a File, and if so use the File constructor to PDFBox rather than the stream one.

I don't know the PDFBox related code well though, so I'll wait for others to comment on the sanity of this... :)

Nick

Reply via email to