You got it right, PDF is within a www page. And it's URL is known & passed
as a variable (strURL) to the function. Another approach which I tried to
get the content of pdf rendered there, but that is also not working -
byte[] ba1 = new byte[1024];
int baLength;
FileOutputStream fos1 = new FileOutputStream("download.pdf");
URL url = new URL(strURL);
URLConnection urlConn = url.openConnection();
InputStream is1 = url.openStream();
while ((baLength = is1.read(ba1)) != -1) {
fos1.write(ba1, 0, baLength);
}
fos1.flush();
fos1.close();
is1.close();
pdDoc = PDDocument.load("download.pdf");
parsedText = pdfStripper.getText(pdDoc);
On Fri, Aug 25, 2017 at 12:45 AM, Tilman Hausherr <[email protected]>
wrote:
> Am 24.08.2017 um 19:27 schrieb Aalok Agrawal:
>
>> I have written following code -
>>
>> PDFTextStripper pdfStripper = null;
>> PDDocument pdDoc = null;
>> COSDocument cosDoc = null;
>> String parsedText = null;
>>
>> URL url = new URL(strURL);
>> BufferedInputStream file = new BufferedInputStream(url.openStream());
>> PDFParser parser = new PDFParser(file);
>>
>> parser.parse();
>> cosDoc = parser.getDocument();
>> pdfStripper = new PDFTextStripper();
>>
>> pdDoc = new PDDocument(cosDoc);
>> parsedText = pdfStripper.getText(pdDoc);
>>
>> But it is not fetching content of pdf embedded in browser.
>>
>
> PDFBox can't communicate with your browser.
>
> url.openStream()
>
> means that the URL content is fetched.
>
> Could it be that the PDF is within a www page? I.e. HTML outside, and PDF
> in a smaller window / frame? Then you'd need to know that URL.
>
> Tilman
>
>
>
>> On Thu, Aug 24, 2017 at 9:08 PM, Gilad Denneboom <
>> [email protected]>
>> wrote:
>>
>> If you don't know the file's URL or the path of the local temp file to
>>> which it is saved I don't see how you could do it.
>>>
>>> On Thu, Aug 24, 2017 at 4:08 PM, Aalok Agrawal <[email protected]> wrote:
>>>
>>> Hi,
>>>>
>>>> I am working on an application where pdf is getting rendered in browser.
>>>> There is no pdf extension in URL.
>>>>
>>>> I have to read the content of the pdf & check some text. Is there any
>>>> way
>>>> to do that.
>>>>
>>>> Thanks
>>>> Aalok Agrawal
>>>>
>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>