Hi,

So you're using a "special" http client...

Anyway, here's what I just did with the 1.8.9 version:

URL url = new URL("http://esa.un.org/unpd/wup/PressRelease/WUP2014_PressRelease.pdf";);
        InputStream is = url.openStream();
        PDDocument doc = PDDocument.load(is);
        System.out.println("pages: " + doc.getNumberOfPages());

All output I get is

    pages: 2

Btw the two "errors" you mention are warnings about malformed PDFs. However there's really a length 66346 in your file and I don't get that warning. This means that somehow you're not getting the exact file. Maybe save what you're downloading with your "http client" and compare it with that you download with a browser. Or try what I did and see if it works.

What version are you using? 1.8.9 is the current one.

Tilman

Am 13.12.2014 um 18:41 schrieb Walter Kehl:
Hi John, Tilman,

thanks for the reply. Here is some additional information:

- the http client I am using to get the input stream already has a user
agent set. Also I have downloaded with PDF box already lots of PDF files
where there never was a problem.
- when I try to load the document remotely from the URL, I get the following
error messages:
   18:34:32 WARN  BaseParser           :: Specified stream length 66346 is
wrong. Fall back to reading stream until 'endstream'.
   18:34:35 WARN  XrefTrailerResolver  :: Did not found XRef object at
specified startxref position 0
- I have written the input stream directly to a file and it was a valid PDF.
It could load it both with an external tool and with PDFBox.

Yes, of course I could always download a file first to a temp file and then
load it into PDFBox. But I think the direct way is more elegant and faster.
I have also debugged a little bit into the code and to me it doesn't look
like PDFBox uses a temporary file, but rather reads directly from the input
stream.... but I might be wrong.

Anyway, thanks for providing such a good free software!

Best
Walter

-----Original Message-----
From: John Hewson [mailto:[email protected]]
Sent: Freitag, 12. Dezember 2014 18:57
To: [email protected]
Subject: Re: Downloadind a pdf file doesn't work

Good point Tilman. Walter, try saving writing the InputStream to a File and
check that it's a valid PDF.

-- John

On 12 Dec 2014, at 09:50, Tilman Hausherr <[email protected]> wrote:

This sounds more like a http problem. Try setting a user agent like a
browser.
https://stackoverflow.com/questions/2529682/setting-user-agent-of-a-ja
va-urlconnection

Tilman

Am 12.12.2014 um 11:53 schrieb Walter Kehl:
Hi all,

I have the following situation:

I am loading with PdfBox files from the internet with the call

PDDocument document = PDDocument.load( inputStream );

So far it has worked nicely, but I have problems with this file :
http://esa.un.org/unpd/wup/PressRelease/WUP2014_PressRelease.pdf

After I load it, it is empty, and the call
document.getNumberOfPages() returns 0.

However when I download the file manually and then load it into
PdfBox, then everything is fine.

Any idea what could be happening? I am currently using PdfBox 1.8.5.

Thanks and Best Regards

Walter


Reply via email to