Thanks for all your response After dump the client request, finally find the reason: we receive some bad request and our server try to de-serialize it, these de-serialization will finally fail, but it can take as long as 10 minute, during which costs a lot of CPU time:
our-server/broken/deserialize_gt5_ms: (average=94619, count=214, *maximum=776506*, minimum=6, p50=85881, p90=115957, p95=128163, p99=258239, p999=776506, p9999=776506, sum=20248467) In our case, we choose a simple solution: change the thrift generated code, and throw Exception directly when we see known field during de-serialization, instead of skip() it. PS. will update thrift to the latest version and check if this still exists. On Tue Jan 13 2015 at 3:16:21 PM Jens Geyer <[email protected]> wrote: > > deserialize broken binary data > > In case of binary it *might* be the case, but OTOH it is unlikely that > there is no other error occurring and the structure of the broken data > follows the binary protocol so close to prevent any other errors. > > So far I fully agree with what has been said. Unless we have more > information, the most likely case seems that some terribly outdated client > tries to reach the server. However, it is still guesswork and the real > cause still may be something else. > > Are you doing some logging at your servers? What do these logs say? > > Have fun, > JensG > > From: [email protected] > Date: Tue, 13 Jan 2015 03:02:46 +0000 > Subject: Re: When will TProtocolUtil.skip() be called > To: [email protected] > > Hi Randy, > > Thanks for your reply. > In my case, the server is guaranteed to be using the latest IDL, and we > have not deleted any field when evolving our IDL. > So may I say that the TProtocolUtil.skip is called because it is trying to > deserialize some broken or malicious binaries? > I also notice that the server spend a lot of time(7% of cpu time) calling > TProtocolUtil.skip, is that an expected behaviour? > > > Also, is it safe for us to update the server to 0.9.2, while leave the > clients using 0.7.0? > Thanks > On Tue Jan 13 2015 at 2:09:34 AM Randy Abernethy <[email protected]> wrote: > Hello Mingmin Liu, > > > > There is at least one other case for skip. > > > > 3. new server sees data from old client and old client is still > > transmitting a field which has been deleted in new server IDL. > > > > Also I would note that Apache Thrift 0.7.0 is several years old, the > > current release is 0.9.2 and many improvements and subtle changes have > > taken place since 0.7.0. > > > > Best, > > Randy > > > > On Mon, Jan 12, 2015 at 12:55 AM, Mingmin Liu > > <[email protected]> wrote: > > > I am using thrift 0.7.0 on my server, and recently find sometimes my > server > > > busy calling TProtocolUtil.skip(). > > > > > > stack trace: > > > > > > "New I/O worker #75" daemon prio=10 tid=0x00007f7804892000 > > > nid=0x51601 runnable [0x00007f76129e7000] > > > java.lang.Thread.State: RUNNABLE > > > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil. > java:122) > > > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil. > java:60) > > > at com.My_Thrift_Object.read(My_Thrift_Object.java:958) > > > at org.apache.thrift.TDeserializer.deserialize(TDeserializer. > java:69) > > > > > > this happens when I try to deserialize the binary content from client. > > > > > > Is it true that TProtocolUtil .skip() will be called only when > > > My_Thrift_Object see some field that it doesn't know ? > > > > > > if yes, when will this happen? > > > > > > I can come up with 2 conditions: > > > > > > 1. old server see data from new client and the new client has defined > some > > > new field in thrift. > > > 2. the binary content is corrupted or someone maliciously build up it. > > > > > > is my understanding right? > >
