Yes, but does WriteDecodedDoc now work correctly, or does it still bring
that LZW error?
About the streams issue: the error status is somewhat misleading, it
should rather be a warning, because there is a "plan B", which is to
disregard the length parameter and to read the PDF until "endstream". If
that one failed too, then there would be a new error message "Error
reading stream using length value". So I wonder if there is another
problem. Sometimes people transfer PDF file in ascii mode from an ftp
server. Could you try the text decode feature of the pdfbox app 2.0 ?
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
command:
java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractText -nonSeq PDF.pdf
Tilman
Am 28.04.2014 18:21, schrieb Jonas Karlsson:
Hi Tilman,
I tried the 1.8.5-SNAPSHOT and get the same result as before. No text and
Apr 28, 2014 12:20:48 PM org.apache.pdfbox.pdfparser.NonSequentialPDFParser
validateStreamLength
SEVERE: The end of the stream doesn't point to the correct offset, using
workaround to read the stream
_jonas
On Mon, Apr 28, 2014 at 11:04 AM, Tilman Hausherr <[email protected]>wrote:
There was a (recently fixed) bug with the LZW decoder, please try the
current snapshot and tell us what happens
https://repository.apache.org/content/groups/snapshots/org/
apache/pdfbox/pdfbox/1.8.5-SNAPSHOT/
Tilman
Am 28.04.2014 17:00, schrieb Jonas Karlsson:
java.io.StreamCorruptedException: Error: data is null
at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:82)