On Thu, Jun 5, 2014 at 11:59 PM, Roy Smith <r...@panix.com> wrote:
> It turns out, we could have upgraded to a newer version of MySQL, which
> did handle astral characters correctly. But, what we did was discarded
> the records containing non-BMP data. Of course, that's a decision that
> can only be made when you understand the business requirements. In our
> case, discarding those four records had no impact on our business, so it
> made sense. For other people, not having the full dataset might have
> been a fatal problem.
> This was just one of many MySQL problems we ran into. Eventually, we
> decided it wasn't worth fighting with what was obviously a brain-dead
> system, and switched databases.
Point to note: It's not just "Avoid MySQL version x.y.z, it's buggy",
but "Make sure you're on a sufficiently new version of MySQL *and then
use these settings*". For instance, the MySQL "utf8"
locale/collation/charset (not sure what it calls it) supports only the
BMP; you have to use "utf8mb4", which is UTF-8 that's allowed to go as
far as four bytes long.
What were they thinking?
What, were they thinking?
I understand there's now an alias "utf8mb3" for the buggy utf8, with
some theory that some future version of MySQL might make utf8 become
an alias for utf8mb4. But when would you ever actually *demand* this
buggy behaviour? Why not just say "as of this version, utf8 is
identical to utf8mb4, which was a superset thereof", and if anything
changes or breaks, just acknowledge that it used to be buggy?