Re: question on string object handling in Python 2.7.8

Dave Angel Wed, 24 Dec 2014 05:15:07 -0800

On 12/23/2014 08:28 PM, Dave Tian wrote:

Hi,


Hi, please do some things when you post new questions:

1) identify your Python version. In this case it makes a bigdifference, as in Python 2.x, the range function is the only thing thattakes any noticeable time in this code.

2) when posting code, use cut 'n paste. You retyped the code, whichcould have caused typos, and in fact did, since your email editor (ornewsgroup editor, or whatever) decided to use 'smart quotes' instead ofsingle quotes. The Unicode characters shown in "Testing code" belowinclude


   LEFT SINGLE QUOTATION MARK
and
   RIGHT SINGLE QUOTATION MARK

which are not valid Python syntax.

There are 2 statements:
A: a = ‘h’
B: b = ‘hh’

According to me understanding, A should be faster as characters would shortcut 
this 1-byte string ‘h’ without malloc;

Nope, there's no such promise in Python. If there were such anoptimization, it might vary between one implementation of Python andanother, and between one version and the next.

But it'd be very hard to implement such an optimization, since the Cinterface would then see it, and third party native libraries would haveto have special coding for this one kind of object.

You're probably thinking of Java and C#, which have native data andboxed data (I don't recall just what each one calls it). Python, atleast for the last 15 years or so, makes everything an object, whichmeans there are no special cases for us to deal with.

B should be slower than A as characters does not work for 2-byte string‘hh’, which triggers the malloc. However, when I put A/B into a big loopand try to measure the performance using cProfile, B seems always fasterthan A.

Testing code:
for i in range(0, 100000000):
        a = ‘h’ #or b = ‘hh’
Testing cmd: python -m cProfile test.py

So what is wrong here? B has one more malloc than A but is faster than B?

In my testing, sometimes A is quicker, and sometimes B is quicker. Butof course there are many ways of testing it, and many versions to testit on. I put those statements (after fixing the quotes) into twofunctions, and called the two functions, letting profile tell me whichwas faster.

Incidentally, just putting them in functions cut the time byapproximately 50%, probably because local variable lookup in a functionin much faster in CPython than access to variables in globals().

There are other things going on, In any recent CPython implementation,certain strings will be interned, which can both save memory and avoidthe constant thrashing of malloc and free. So we might get differentresults by choosing a string which won't happen to get interned.

It's hard to get excited over any of these differences, but it is fun tothink about it.


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Re: question on string object handling in Python 2.7.8

Reply via email to