Hi John, Tilman, thanks for the reply. Here is some additional information:
- the http client I am using to get the input stream already has a user agent set. Also I have downloaded with PDF box already lots of PDF files where there never was a problem. - when I try to load the document remotely from the URL, I get the following error messages: 18:34:32 WARN BaseParser :: Specified stream length 66346 is wrong. Fall back to reading stream until 'endstream'. 18:34:35 WARN XrefTrailerResolver :: Did not found XRef object at specified startxref position 0 - I have written the input stream directly to a file and it was a valid PDF. It could load it both with an external tool and with PDFBox. Yes, of course I could always download a file first to a temp file and then load it into PDFBox. But I think the direct way is more elegant and faster. I have also debugged a little bit into the code and to me it doesn't look like PDFBox uses a temporary file, but rather reads directly from the input stream.... but I might be wrong. Anyway, thanks for providing such a good free software! Best Walter -----Original Message----- From: John Hewson [mailto:[email protected]] Sent: Freitag, 12. Dezember 2014 18:57 To: [email protected] Subject: Re: Downloadind a pdf file doesn't work Good point Tilman. Walter, try saving writing the InputStream to a File and check that it's a valid PDF. -- John > On 12 Dec 2014, at 09:50, Tilman Hausherr <[email protected]> wrote: > > This sounds more like a http problem. Try setting a user agent like a browser. > > https://stackoverflow.com/questions/2529682/setting-user-agent-of-a-ja > va-urlconnection > > Tilman > > Am 12.12.2014 um 11:53 schrieb Walter Kehl: >> Hi all, >> >> >> I have the following situation: >> >> >> I am loading with PdfBox files from the internet with the call >> >> PDDocument document = PDDocument.load( inputStream ); >> >> >> So far it has worked nicely, but I have problems with this file : >> http://esa.un.org/unpd/wup/PressRelease/WUP2014_PressRelease.pdf >> >> >> After I load it, it is empty, and the call >> document.getNumberOfPages() returns 0. >> >> However when I download the file manually and then load it into >> PdfBox, then everything is fine. >> >> >> Any idea what could be happening? I am currently using PdfBox 1.8.5. >> >> >> Thanks and Best Regards >> >> Walter >> >> >> >> >> >

