Re: Python fastbinary protocol deserialization performance issue

Ned Rockson Wed, 08 Dec 2010 01:04:50 -0800

Thanks for the suggestion Kevin.

Turning off gc (gc.disable) reduces 99th percentile to just noise. Upon
further investigation, it appears that python garbage collection runs with
default threshold settings of (700, 10, 10) meaning that every 700 objects
allocated (actually allocated - deallocated, but for sake of argument if I
have a thrift object that will allocate around 700 objects upon
deserialization, it is just 700 allocated) gc is run on generation 0. After
10 gc's of generation 0, gen 1 is collected and after 10 of gen 1, gen 2 is
collected. I will have to do some more research, investigation and testing
to find out what the right numbers are for our system, but having the gc run
at least once per thrift deserialization is not helping matters!


For completeness, when I printed out gc stats, doing a gc on generation 0
took 1.5-3 ms, generation 1 took 3-5 ms, and generation 2 took around 20 ms
on my test machine - a c1.xlarge EC2 instance. So worst case is a disgusting
3 + 5 + 20 = 28 ms delay in execution. Wow.

--Ned

On Tue, Dec 7, 2010 at 11:11 PM, Kevin Clark <[email protected]> wrote:

> On Tue, Dec 7, 2010 at 11:06 PM, Ned Rockson <[email protected]> wrote:
> > Hi,
> >
> > I'm currently investigating an issue involved in a major performance
> issue
> > in the 98th  percentile of deserialization of a thrift object in python
> > using the fastbinary protocol. The objects aren't complicated but do
> contain
> > a fair number of children - 500 or more in these cases. Performance is
> > important and in average case it is okay, but 2% of the time the speed is
> > too slow.
> >
> > I ran a test that loops on Fetch object from memcache, Deserialize object
> > 500 times in a row. Statistics are tracked for a number of different
> objects
> > through this test. The output is consistently similar to this:
> >
> > TOTAL TIME: [9.2217922210693359, 9.3901157379150391, 10.351181030273438,
> > 10.828018188476562, 11.240959167480469, 32.318115234375]
> > DESERIALIZATION TIME: [6.6769123077392578, 6.7799091339111328,
> > 7.8821182250976562, 8.3730220794677734, 8.6162090301513672,
> > 29.79588508605957]
> >
> > Where each element in the array specifies a different percentile - 25,
> 50,
> > 75, 90, 95, 99. What's clear is the 99th %ile is about 4x higher than the
> > median. When running cProfile, all of the time spent in deserialization
> is
> > spent in the read() method of ttypes for the different objects that are
> > being deserialized. I pasted an excerpt below.
> >
> >    18000    0.023    0.000    0.023    0.000 ttypes.py:1057(__init__)
> >    18000    0.052    0.000    0.052    0.000 ttypes.py:1162(__init__)
> >     4500    0.009    0.000    0.009    0.000 ttypes.py:1735(__init__)
> >     4500    0.006    0.000    0.006    0.000 ttypes.py:1929(__init__)
> >     9000    0.014    0.000    0.014    0.000 ttypes.py:2115(__init__)
> >      500    0.001    0.000    0.001    0.000 ttypes.py:2224(__init__)
> >      500    0.001    0.000    0.001    0.000 ttypes.py:2338(__init__)
> >   252000    0.356    0.000    0.356    0.000 ttypes.py:245(__init__)
> >    18000    0.030    0.000    0.030    0.000 ttypes.py:2495(__init__)
> >      500    0.001    0.000    0.001    0.000 ttypes.py:2600(__init__)
> >   252000    0.917    0.000    0.917    0.000 ttypes.py:371(__init__)
> >      500    0.002    0.000    0.002    0.000 ttypes.py:55(__init__)
> >    18000    0.020    0.000    0.020    0.000 ttypes.py:570(__init__)
> > ==>      500    0.004    0.000    3.733    0.007 ttypes.py:64(read)
> >    18000    0.030    0.000    0.030    0.000 ttypes.py:648(__init__)
> >    18000    0.040    0.000    0.040    0.000 ttypes.py:767(__init__)
> >
> >
> > I'm trying to come up with a way to make sense of this and having
> trouble,
> > so I'm hoping that someone else has seen an issue like this. Please
> respond
> > with any thoughts around potential work-arounds or other tests that could
> be
> > helpful.
> >
> > Thank you,
> > Ned
> >
> > PS I wasn't sure whether to send this to user or dev, so if this is the
> > wrong mailing list please let me know.
> >
>
> Is your test fetching the *same* object each time?
>
> If so, is it possible that gc is getting run during that big slowdown?
>
> --
> Kevin Clark
> http://glu.ttono.us
>

Re: Python fastbinary protocol deserialization performance issue

Reply via email to