Hello mpsuzuki, Am 26.04.2018 um 19:03 schrieb suzuki toshiya: > Dear Adam, > > Thank you for review & comment, and sorry for lated response. > > Adam Reichold wrote: >> I like it. Some minor remarks: >> >> * Is this the only user of iconv within Poppler? If so, we could remove >> it from the build system as wel. > > OK. > >> * In "ustring::to_utf8": >> - "utf16_end" is only used to compute the size once, maybe just use >> "size()" to call "utf16ToUtf8" > > It is slightly different, it is used to compute the length without BOM. > Also the first UTF-16 character to be passed to utf16ToUtf8() is changed > to skip the BOM. To eliminate it, the code would be like this. This > is better? > > byte_array ustring::to_utf8() const > { > if (!size()) { > return byte_array(); > }
if (empty()) would be more direct. > const uint16_t* utf16_buff = data(); > const bool hasBOM = (utf16_buff[0] == 0xFEFF); Why not modify the pointer once? if (utf16_buff[0] == 0xFEFF) { ++uft16_buff; } > const int utf8_len = utf16CountUtf8Bytes(utf16_buff + (hasBOM ? 1 : 0)); > > byte_array ret(utf8_len + 1, '\0'); > utf8_len = utf16ToUtf8(utf16_buff + (hasBOM ? 1 : 0), > reinterpret_cast<char *>(ret.data()), > utf8_len + 1, > size() - (hasBOM ? 1 : 0)); > ret.resize(utf8_len); > > return ret; > } > >> - "utf16ToUtf8" returns how many bytes were written, so no need to >> call "std::strlen" again > > OK. > >> * In "ustring::to_latin1": >> - for readability, I'd suggest either iterating both "me" and "ret" >> using a parallel index "i" or via pointer arithmetic, but not mixing it > > how about this? > std::string ustring::to_latin1() const > { > if (!size()) { > return std::string(); > } > > const bool hasBOM = (*data() == 0xFEFF); > std::string ret(size() + (hasBOM ? 0 : 1), '\0'); > iterator itl_utf16 = begin() + (hasBOM ? 1 : 0); > iterator itl_ret = ret.begin(); > > for (; itl_utf16 < end(); itl_utf16++, itl_latin1++) { > *itl_ret = (char)*itl_utf16; > } Again, I would suggest just modifing a copy of data() to skip the first element if it is a BOM. Also iterators are idiomatically checked using !=, i.e. itl_utf16 != end(). Best regards, Adam. > return ret; > } > > >> * In "has_bom_utf8": >> - The spacing of the first "if" is a bit off >> - Instead of "if(foo) { return true; } return false;", just do "return >> foo;" > > OK. > >> * In "ustring::from_utf8": >> - Three times "skip BOM" is probably a bit on the noisy side ;-) > > OK. > >> - Why no additional call to "ret.resize" as for "to_utf8"? Or why the >> additional call there? > > ret.resize() added. > >> Just for my understanding, this also contains another important fix that >> is unrelated to the iconv usage? >> >> - u = data[i] & 0xff; >> + u = pdfDocEncoding[ *data & 0xff ]; > > Ah, yes. I had once explained in the beginning of this thread > for reworking, but slipped to mention it for the patch. > >> Maybe the changes to "detail::unicode_GooString_to_ustring" could be a >> separate commit? > > OK, it is separated to fix-cpp-frontend-endian-issue_20180427-0130-b. > Others are summarized in fix-cpp-frontend-endian-issue_20180427-0130-a. > > Regards, > mpsuzuki >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler