I maintain the poppler bindings for the R programming language and get a lot of bug reports about corrupted text extracted with poppler. Below a minimal example that illustrates the problem:
git clone https://github.com/jeroen/popplertest cd popplertest g++ -std=c++11 encoding.cpp -o encoding $(pkg-config --cflags --libs poppler-cpp) ./encoding hello.pdf The output shows a lot of Chinese characters which is incorrect (all text in the pdf is english). Back in March 2018, Suzuki Toshiya had posted a patch with at least a partial solution: https://lists.freedesktop.org/archives/poppler/2018-March/012962.html . I hope we can revisit this. _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
