In article <f935e85f-f86a-4821-86ab-3ab7e5e21...@googlegroups.com>,
Rustom Mody <rustompm...@gmail.com> wrote:
> On Thursday, June 5, 2014 12:12:06 AM UTC+5:30, Roy Smith wrote:
> > Yup. I wrote a while(*) back about the pain I was having importing some
> > data into a MySQL(**) database
> Here's my interpretation of that situation; I'd like to hear yours:
> Basic problem was that MySQL handled a strict subset of what the rest
> of the system (Python 2.7?) could handle.
Yes. This was not a Python issue. I was just responding to ChrisA's
>>> Binding your program to BMP-only is nearly as dangerous as binding
>>> it to ASCII-only; potentially worse, because you can run an awful
>>> lot of artificial tests without remembering to stick in some astral
> Of course switching to postgres may be a sound choice on other fronts.
> But if that were not an option, and you only had these choices:
> - significantly complexify your MySQL data structures to handle 4 in
> 20 million cases
> - just detect and throw such cases out at the outset
> which would you take?
It turns out, we could have upgraded to a newer version of MySQL, which
did handle astral characters correctly. But, what we did was discarded
the records containing non-BMP data. Of course, that's a decision that
can only be made when you understand the business requirements. In our
case, discarding those four records had no impact on our business, so it
made sense. For other people, not having the full dataset might have
been a fatal problem.
This was just one of many MySQL problems we ran into. Eventually, we
decided it wasn't worth fighting with what was obviously a brain-dead
system, and switched databases.