I maintain the poppler bindings for the R programming language and get
a lot of bug reports about corrupted text extracted with poppler.
Below a minimal example that illustrates the problem:

  git clone https://github.com/jeroen/popplertest
  cd popplertest
  g++ -std=c++11 encoding.cpp -o encoding $(pkg-config --cflags --libs
poppler-cpp)
  ./encoding hello.pdf

The output shows a lot of Chinese characters which is incorrect (all
text in the pdf is english).

Back in March 2018, Suzuki Toshiya had posted a patch with at least a
partial solution:
https://lists.freedesktop.org/archives/poppler/2018-March/012962.html
. I hope we can revisit this.
_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to