Hello,
The persian text is vector graphics, not text from fonts. Extracting
from Adobe Reader doesn't work either. You'll need OCR. Sorry!
Tilman
PS: the current version is 2.0.11.
Am 01.09.2018 um 07:54 schrieb Azadeh Fakhrzadeh:
Dear Tilman, Thank you very much for your reply. I am using pdfbox-2.0.9.
Here is a link to a sample of the documents that I use:
http://www.filedropper.com/t2_1
Thanks
Azadeh
On Wed, Aug 29, 2018 at 8:56 PM Tilman Hausherr <thaush...@t-online.de>
wrote:
I've used https://www.filedropper.com in the past. Please do also answer
what version you are using.
Tilman
Am 29.08.2018 um 10:55 schrieb Azadeh Fakhrzadeh:
Thank you Tilman. Can you kindly provide a link where i can upload the
document.
I added icu4j-62-1.jar icu4j-62-1-docs.jar and icu4j-62-1-src. jar in
the classpath, and here is my code:
package org.pdfBox.pdfBox1;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class ReadingText {
public static void main(String args[]) throws IOException {
//Loading an existing document
File file = new File("/test/t2.pdf");
PDDocument document = PDDocument.load(file);
//Instantiate PDFTextStripper class
PDFTextStripper pdfStripper = new PDFTextStripper();
//Retrieving text from PDF document
String text = pdfStripper.getText(document);
System.out.println(text);
//Closing the document
document.close();
}
}
On Wed, Aug 29, 2018 at 11:33 AM Tilman Hausherr <thaush...@t-online.de>
wrote:
Am 29.08.2018 um 08:13 schrieb Azadeh Fakhrzadeh:
Hi,
I try to extract test from Persian document using pdfbox, it returns
"?"
for all Persian characters, it works well with Latin characters. How
Can I
fix it? any advice? /thanks
Hello,
What PDFBox version are you using? What code are you using, or are you
using the command line utilities? Can you share the document (upload it
to a sharehoster, don't attach in post)?
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org