FYI the encoding problems still exist in the master branch today. I am
very interested in this patch by mpsuzuki, what can we do to move this

On Wed, Mar 28, 2018 at 2:26 PM, suzuki toshiya
<> wrote:
> Dear Adam,
> Adam Reichold wrote:
>>> I see. where is the appropriate place to add a document of
>>> poppler::ustring class itself?
>> Personally, I would suggest Doxygen comments in the public header.
> Thanks! Now I'm trying to write... also I found Doxygen comments
> for text_list needs the improvement.
> During the check of the existing functions (to add documents),
> I found a few inconsistencies about BOM.
> * ustring::to_latin1() this function does not use iconv(),
> this function just cast the types between unsigned short and
> char. BOM could not be converted to Latin-1, but the exist of
> BOM is not checked. if stored UTF-16 has a BOM, broken 8bit
> would be inserted in the beginning of the result.
> * ustring::from_latin1() this function does not use iconv()
> either. BOM is not inserted to the beginning. no-BOM UTF-16
> string is created.
> * ustring::to_utf8() BOM or no-BOM is decided by iconv().
> * ustring::from_utf8() assuming iconv() returns with-BOM UTF-16.
> I would collect Debian software packages depending libpoppler-cpp,
> and check how they use ustring object. In my rough check it
> would be less than 10, checking all of them would not be so
> time-consuming. If there are softwares which always the skip
> first character of UTF-16 (based on the assumption as the
> ustring is always with UTF-16 with BOM), some discussion is
> needed.
> Regards,
> mpsuzuki
> _______________________________________________
> poppler mailing list
poppler mailing list

Reply via email to