On Thursday, June 5, 2014 12:12:06 AM UTC+5:30, Roy Smith wrote:
>  Chris Angelico  wrote:

> > You can't ignore those. You might be able to say "Well, my program
> > will run slower if you throw these at it", but if you're going down
> > that route, you probably want the full FSR and the advantages it
> > confers on ASCII and Latin-1 strings. Binding your program to BMP-only
> > is nearly as dangerous as binding it to ASCII-only; potentially worse,
> > because you can run an awful lot of artificial tests without
> > remembering to stick in some astral characters.

> Yup.  I wrote a while(*) back about the pain I was having importing some 
> data into a MySQL(**) database which (unknown to me when I started) only 
> handled BMP.  It turns out in the entire dataset of 20-odd million 
> records, there were exactly four that had astral characters.  All of my 
> tests worked.  I didn't discover the problem until it blew up many hours 
> into the "final" production import run.

> (*) Two years?

> (**) This was not the only pain point with MySQL.  We eventually 
> switched to Postgress.

Thanks Roy for bringing up that example - I was trying to recollect
the details.  I forgot about the MySQL angle which adds a different
twist to it.

Here's my interpretation of that situation; I'd like to hear yours:

Basic problem was that MySQL handled a strict subset of what the rest
of the system (Python 2.7?)  could handle.  This meant that at a late
(and embarrassing) stage, exceptions were being thrown, from deep
within the system.

OTOH, let's say you could detect the 'error' (more correctly
'un-handle-able') at the borders of your system, say when the user
enters the data on a web-form. Would you have a problem kicking out
those characters (in both senses!) with a curt:

"Cant deal with all this supra-galactic rubble!" ?

Of course switching to postgres may be a sound choice on other fronts.
But if that were not an option, and you only had these choices:

- significantly complexify your MySQL data structures to handle 4 in
  20 million cases
- just detect and throw such cases out at the outset

which would you take?

In any case this is the choice I hear from the micropython folks
who are explicitly seeking a cutdown version of python

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to