Hi,

I'm currently investigating an issue involved in a major performance issue
in the 98th  percentile of deserialization of a thrift object in python
using the fastbinary protocol. The objects aren't complicated but do contain
a fair number of children - 500 or more in these cases. Performance is
important and in average case it is okay, but 2% of the time the speed is
too slow.

I ran a test that loops on Fetch object from memcache, Deserialize object
500 times in a row. Statistics are tracked for a number of different objects
through this test. The output is consistently similar to this:

TOTAL TIME: [9.2217922210693359, 9.3901157379150391, 10.351181030273438,
10.828018188476562, 11.240959167480469, 32.318115234375]
DESERIALIZATION TIME: [6.6769123077392578, 6.7799091339111328,
7.8821182250976562, 8.3730220794677734, 8.6162090301513672,
29.79588508605957]

Where each element in the array specifies a different percentile - 25, 50,
75, 90, 95, 99. What's clear is the 99th %ile is about 4x higher than the
median. When running cProfile, all of the time spent in deserialization is
spent in the read() method of ttypes for the different objects that are
being deserialized. I pasted an excerpt below.

    18000    0.023    0.000    0.023    0.000 ttypes.py:1057(__init__)
    18000    0.052    0.000    0.052    0.000 ttypes.py:1162(__init__)
     4500    0.009    0.000    0.009    0.000 ttypes.py:1735(__init__)
     4500    0.006    0.000    0.006    0.000 ttypes.py:1929(__init__)
     9000    0.014    0.000    0.014    0.000 ttypes.py:2115(__init__)
      500    0.001    0.000    0.001    0.000 ttypes.py:2224(__init__)
      500    0.001    0.000    0.001    0.000 ttypes.py:2338(__init__)
   252000    0.356    0.000    0.356    0.000 ttypes.py:245(__init__)
    18000    0.030    0.000    0.030    0.000 ttypes.py:2495(__init__)
      500    0.001    0.000    0.001    0.000 ttypes.py:2600(__init__)
   252000    0.917    0.000    0.917    0.000 ttypes.py:371(__init__)
      500    0.002    0.000    0.002    0.000 ttypes.py:55(__init__)
    18000    0.020    0.000    0.020    0.000 ttypes.py:570(__init__)
==>      500    0.004    0.000    3.733    0.007 ttypes.py:64(read)
    18000    0.030    0.000    0.030    0.000 ttypes.py:648(__init__)
    18000    0.040    0.000    0.040    0.000 ttypes.py:767(__init__)


I'm trying to come up with a way to make sense of this and having trouble,
so I'm hoping that someone else has seen an issue like this. Please respond
with any thoughts around potential work-arounds or other tests that could be
helpful.

Thank you,
Ned

PS I wasn't sure whether to send this to user or dev, so if this is the
wrong mailing list please let me know.

Reply via email to