[ 
https://issues.apache.org/jira/browse/PDFBOX-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034691#comment-17034691
 ] 

Tilman Hausherr edited comment on PDFBOX-4769 at 2/11/20 6:21 PM:
------------------------------------------------------------------

Please attach the PDF. Also read this:
https://pdfbox.apache.org/2.0/faq.html#text-extraction

Do you get any text when using Adobe Reader?


was (Author: tilman):
Please attach the PDF. Also read this:
https://pdfbox.apache.org/2.0/faq.html#text-extraction

> Problem pdf version 1.4
> -----------------------
>
>                 Key: PDFBOX-4769
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4769
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.17
>         Environment: java, maven, 
>            Reporter: NathanJ
>            Priority: Blocker
>
> Here is my problem. I have to read pdf files and i decided to use pdfbox. I'm 
> using the following code to read my file line by line to execute some actions 
> on each ones :
> File tempFile = "_myPdfFile"_
> {color:#cc7832}try {color}(PDDocument document = PDDocument.load(tempFile)) 
> {{color:#cc7832}
> {color}{color:#cc7832}
> {color}{color:#cc7832} if {color}(!document.isEncrypted())
>  {
>  PDFTextStripperByArea stripper = {color:#cc7832}new 
> {color}PDFTextStripperByArea(){color:#cc7832};
> {color} stripper.setSortByPosition({color:#cc7832}true{color}){color:#cc7832};
> {color} PDFTextStripper tStripper = {color:#cc7832}new 
> {color}PDFTextStripper(){color:#cc7832};
> {color} String pdfFileInText = tStripper.getText(document){color:#cc7832};
> {color} String lines[] = 
> pdfFileInText.split({color:#6a8759}"{color}{color:#cc7832}\\{color}{color:#6a8759}r?{color}{color:#cc7832}\\{color}{color:#6a8759}n"{color}){color:#cc7832};{color}
> For a pdf in format version 1.7, all is working well. But sometimes, i have 
> to work with pdf version 1.4 and at this moment there is a problem : the 
> PDFTextStripper is unable to read the pdf and my "pdfFileInText" get this 
> value : "\r\n\r\n" and that's all. 
>  
> I didn't find any solutions on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to