Hi all, I just started using the pdftotext python module to extract text from PDFs and It really does look good so thanks for your hard work.
The only issue I am having right now is regarding the extraction of pricing information such as within a menu. A lot of restaurants won't use a dot to separate dollars and cents but will rely on a slightly smaller font size for cents. As a result, an item listed at 4.00$ comes out at 400... Is there anyway to detect such changes in fonts size/color and treat them as separate words? I am not sure if this would be better to support this on the python side or directly within poppler. Thanks
_______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
