Re: Performance of int/long in Python 3

rusi Mon, 01 Apr 2013 06:18:19 -0700

On Apr 1, 5:15 pm, Roy Smith <[email protected]> wrote:
> In article <[email protected]>,
>  Steven D'Aprano <[email protected]> wrote:
>
> > [...]
> > >> OK, that leads to the next question.  Is there anyway I can (in Python
> > >> 2.7) detect when a string is not entirely in the BMP?  If I could find
> > >> all the non-BMP characters, I could replace them with U+FFFD
> > >> (REPLACEMENT CHARACTER) and life would be good (enough).
>
> > Of course you can do this, but you should not. If your input data
> > includes character C, you should deal with character C and not just throw
> > it away unnecessarily. That would be rude, and in Python 3.3 it should be
> > unnecessary.
>
> The import job isn't done yet, but so far we've processed 116 million
> records and had to clean up four of them.  I can live with that.
> Sometimes practicality trumps correctness.


That works out to 0.000003%. Of course I assume it is US only data.
Still its good to know how skew the distribution is.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Performance of int/long in Python 3

Reply via email to