Please see below... Xuekun Hu wrote:
I think I have some idea about why "lock wait timeout" happened.I noticed that the executing time of the below query is continuing to grow with the test running. (SELECT st.tag as tag,st.refcount as count from SOCIALEVENTTAG as st order by st.refcount desc limit 50) order by tag asc; The query will do table scan which is time comsuming. And the query will be executed in every homepage and login operations, which are executed at high frequency.
Unless you turn off the cache, homepage and login are cached so this query is only run once in a fairly long while.
With testing continue to run, the SOCIALEVENTTAG table size is increasing (though stop increasing at 5000), the upper query executing time is increasing. Then it will cause the query "update SOCIALEVENTTAG set refcount = refcount + 1 where tag='$tag'" waiting the lock longer and longer. Then when "lock wait timeout", the transaction will rollback. And the situation is becoming worse and worse, the throughput will drop sharply, then the result will fail.
Again, that's true if cache is off. We know the tag cloud generation query is rather heavy. Also, there is no sensitivity to whether the tag cloud is actually up-to-date or not. With caching, the query is run only every 5 minutes. You should not even notice it.
I am thinking is the "SELECT ..." query executed too freqently? For some websites, as far as I know, they will create a seperate table to hold the top reference data, like one table to hold the top 50 refcount in this case. The table will be updated regurally, like every 10 min, instead of every homepage and login operations. The top refcount data is not real-time, but for websites, it don't matter. It is only for users to view. It can decrease the response time of the websites.
We use memcached to cache that info instead of a separate table. -Akara
