I'm testing the new page::text_list() function but I run into an old problem where the conversion of the ustring to UTF-8 doesn't do what I expect:
byte_array buf = x.to_utf8(); std::string y(buf.begin(), buf.end()); const char * str = y.c_str(); The resulting char * is not UTF-8. It contains random Chinese characters for pdf files with plain english ascii text. I can work around the problem by using x.to_latin1(), which gives the correct text, mostly, but obviously it doesn't work for non english text. I remember running into this before for example when reading a toc_item->title() or document->info_key() the conversion to utf8 als doesn't seem to work. Perhaps I am misunderstanding how this works. Is there some limitation on pdfs or ustrings that limits their ability to be converted to UTF-8? Somehow I am not getting this problem for ustrings from the page->text() method. _______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler