Hello,

The persian text is vector graphics, not text from fonts. Extracting from Adobe Reader doesn't work either. You'll need OCR. Sorry!

Tilman

PS: the current version is 2.0.11.

Am 01.09.2018 um 07:54 schrieb Azadeh Fakhrzadeh:
Dear Tilman, Thank you very much for your reply. I am using pdfbox-2.0.9.
Here is a link to a sample of the documents that I use:
http://www.filedropper.com/t2_1

Thanks
Azadeh

On Wed, Aug 29, 2018 at 8:56 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

I've used https://www.filedropper.com in the past. Please do also answer
what version you are using.

Tilman

Am 29.08.2018 um 10:55 schrieb Azadeh Fakhrzadeh:
Thank you Tilman.  Can you kindly provide a link where i can upload the
document.
I added  icu4j-62-1.jar  icu4j-62-1-docs.jar   and icu4j-62-1-src. jar in
the classpath, and here is my code:
package org.pdfBox.pdfBox1;

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class ReadingText {

     public static void main(String args[]) throws IOException {

        //Loading an existing document
        File file = new File("/test/t2.pdf");
        PDDocument document = PDDocument.load(file);
        //Instantiate PDFTextStripper class
        PDFTextStripper pdfStripper = new PDFTextStripper();

        //Retrieving text from PDF document
        String text = pdfStripper.getText(document);

        System.out.println(text);

        //Closing the document
        document.close();

     }
}



On Wed, Aug 29, 2018 at 11:33 AM Tilman Hausherr <thaush...@t-online.de>
wrote:

Am 29.08.2018 um 08:13 schrieb Azadeh Fakhrzadeh:
Hi,
I try to extract test from Persian document using pdfbox, it returns
"?"
for all Persian characters,  it works well with Latin characters. How
Can I
fix it? any advice?  /thanks

Hello,

What PDFBox version are you using? What code are you using, or are you
using the command line utilities? Can you share the document (upload it
to a sharehoster, don't attach in post)?

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to