On Thursday, June 5, 2014 12:12:06 AM UTC+5:30, Roy Smith wrote:
> Chris Angelico wrote:
> > You can't ignore those. You might be able to say "Well, my program
> > will run slower if you throw these at it", but if you're going down
> > that route, you probably want the full FSR and the advantages it
> > confers on ASCII and Latin-1 strings. Binding your program to BMP-only
> > is nearly as dangerous as binding it to ASCII-only; potentially worse,
> > because you can run an awful lot of artificial tests without
> > remembering to stick in some astral characters.
> Yup. I wrote a while(*) back about the pain I was having importing some
> data into a MySQL(**) database which (unknown to me when I started) only
> handled BMP. It turns out in the entire dataset of 20-odd million
> records, there were exactly four that had astral characters. All of my
> tests worked. I didn't discover the problem until it blew up many hours
> into the "final" production import run.
> (*) Two years?
> (**) This was not the only pain point with MySQL. We eventually
> switched to Postgress.
Thanks Roy for bringing up that example - I was trying to recollect
the details. I forgot about the MySQL angle which adds a different
twist to it.
Here's my interpretation of that situation; I'd like to hear yours:
Basic problem was that MySQL handled a strict subset of what the rest
of the system (Python 2.7?) could handle. This meant that at a late
(and embarrassing) stage, exceptions were being thrown, from deep
within the system.
OTOH, let's say you could detect the 'error' (more correctly
'un-handle-able') at the borders of your system, say when the user
enters the data on a web-form. Would you have a problem kicking out
those characters (in both senses!) with a curt:
"Cant deal with all this supra-galactic rubble!" ?
Of course switching to postgres may be a sound choice on other fronts.
But if that were not an option, and you only had these choices:
- significantly complexify your MySQL data structures to handle 4 in
20 million cases
- just detect and throw such cases out at the outset
which would you take?
In any case this is the choice I hear from the micropython folks
who are explicitly seeking a cutdown version of python