I mentioned one approach in my first email, then I mentioned another approach in my last email. Both are not working. I meant to say that. URL is not public, so won't be able to share. As per your suggestion, I opened download.pdf in text editor & found that it is not a pdf but a login page of my site.
So I have write a code to pass on credentials, so that it can proceed with authentication. Is there a way to pass on credentials,using pdfbox API. On Fri, Aug 25, 2017 at 9:41 PM, Tilman Hausherr <[email protected]> wrote: > Am 25.08.2017 um 11:45 schrieb Aalok Agrawal: > >> You got it right, PDF is within a www page. And it's URL is known & passed >> as a variable (strURL) to the function. Another approach which I tried to >> get the content of pdf rendered there, but that is also not working - >> > > > Is the URL public and freely available? If yes, please mention it so I can > test. > > "but that is also not working" - what does that mean? Do you get an error > message, nothing, a JVM crash, a BSOD, ...? > > What is in that "download.pdf" file? Is this a PDF or is it not? Does it > start with "%PDF" or not if you open the file with NOTEPAD++? > > If it isn't, then it means that your PDF has a different URL. You'll have > to look at the html / javascript source code to find out what is going on. > > Tilman > > > > > >> byte[] ba1 = new byte[1024]; >> int baLength; >> FileOutputStream fos1 = new FileOutputStream("download.pdf"); >> URL url = new URL(strURL); >> URLConnection urlConn = url.openConnection(); >> >> InputStream is1 = url.openStream(); >> while ((baLength = is1.read(ba1)) != -1) { >> fos1.write(ba1, 0, baLength); >> } >> fos1.flush(); >> fos1.close(); >> is1.close(); >> pdDoc = PDDocument.load("download.pdf"); >> parsedText = pdfStripper.getText(pdDoc); >> >> On Fri, Aug 25, 2017 at 12:45 AM, Tilman Hausherr <[email protected]> >> wrote: >> >> Am 24.08.2017 um 19:27 schrieb Aalok Agrawal: >>> >>> I have written following code - >>>> >>>> PDFTextStripper pdfStripper = null; >>>> PDDocument pdDoc = null; >>>> COSDocument cosDoc = null; >>>> String parsedText = null; >>>> >>>> URL url = new URL(strURL); >>>> BufferedInputStream file = new BufferedInputStream(url.openStream()); >>>> PDFParser parser = new PDFParser(file); >>>> >>>> parser.parse(); >>>> cosDoc = parser.getDocument(); >>>> pdfStripper = new PDFTextStripper(); >>>> >>>> pdDoc = new PDDocument(cosDoc); >>>> parsedText = pdfStripper.getText(pdDoc); >>>> >>>> But it is not fetching content of pdf embedded in browser. >>>> >>>> PDFBox can't communicate with your browser. >>> >>> url.openStream() >>> >>> means that the URL content is fetched. >>> >>> Could it be that the PDF is within a www page? I.e. HTML outside, and PDF >>> in a smaller window / frame? Then you'd need to know that URL. >>> >>> Tilman >>> >>> >>> >>> On Thu, Aug 24, 2017 at 9:08 PM, Gilad Denneboom < >>>> [email protected]> >>>> wrote: >>>> >>>> If you don't know the file's URL or the path of the local temp file to >>>> >>>>> which it is saved I don't see how you could do it. >>>>> >>>>> On Thu, Aug 24, 2017 at 4:08 PM, Aalok Agrawal <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>>> I am working on an application where pdf is getting rendered in >>>>>> browser. >>>>>> There is no pdf extension in URL. >>>>>> >>>>>> I have to read the content of the pdf & check some text. Is there any >>>>>> way >>>>>> to do that. >>>>>> >>>>>> Thanks >>>>>> Aalok Agrawal >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >

