And in case the PDF consists of images (of text) rather than text, you need to OCR it.

https://en.wikipedia.org/wiki/Optical_character_recognition

OCR's success depends on the quality and contrast (etc) of the source material. It's likely you need to proofread the text however. There are various tools in the repos.

Reply via email to