[rt-users] Large Queue size problems

Nick Geron Mon, 19 Apr 2010 16:41:15 -0700

Hi all,

Our company currently runs RT for customer support interactions as wellas a central email abuse reporting system for customer IP blocks.Recently we setup a feedback system with a large hosted mail providerand we saw the level of incoming abuse/spam reports increase to 10s ofthousands a day. I have been trying to identify the source of an issuethat essentially boils down to this: When our RT queues are 'large'(over 100K tickets) the UI struggles to complete operations or consumesall system resources.

To mitigate the issues, we have been using rt-shredder to cull out theexcess, but I have a backed up DB to test with. What I have found isthat on a particular type of search, the returned DB data set is solarge the apache process handling the request consumes almost allavailable memory on the RT host, leading to swapping and/or a nastysegfault.


Our setup involves three hosts:

1 dedicated Gentoo based DB host running MySQL 5.0 with innodb basedtables. 2G ram and 1 64bit quad core xeon running under VMWare vSphere 4.2 load balanced Gentoo based apache servers running RT3.8.2 with thesame proc/cpu specs as the DB host.


The magic search that overloads apache works as follows:

1) Click on our large queue from RT at a glance Quick Search. The queuein question contains 184744 new, 7 open and 7731 stalled tickets in mydumped database.

2) Click on any ticket on any of the returned pages.

Apache then consumes so much memory that we have to kill the process atbest or restart the server at worst. In a browser, the ticket pageoften fails to load or may be partially completed before the hostresources are exhausted.

The MySQL query also reveals that the last operation in this statereturns a large chunk of data, and often pops up in the slow query logwith an average execution time of 15 seconds. My first thought wasthat we had an issue with out database. However several days of testingindicated that this problem was directly related to apache/mod_perlhaving to drink from a firehose.


Here's the entry that always logs to the slow query log:

# Time: 100419 16:58:07
# u...@host: rtadmin[rtadmin] @ rt-test[172.20.0.99]
# Query_time: 15  Lock_time: 0  Rows_sent: 184751  Rows_examined: 377948
use rt3;

SELECT main.* FROM Tickets main WHERE (main.Status != 'deleted') AND(main.Queue = '11' AND ( main.Status = 'new' OR main.Status = 'open' )) AND (main.EffectiveId = main.id) AND (main.Type = 'ticket') ORDER BYmain.id ASC;

Interestingly, searching for the specific ticket via the main pagesearch box brings up the typed in ticket quickly and without incident.Another tidbit, is this appears to involve some level of caching. If Ifollow the above steps, then kill the process and finally select anotherticket NOT in the large queue (one off my own top 15 tickets) then thesame behavior is observed. Also, I see queries in the MySQL query logthat include data related to the previous search. I have performed abattery of tests stopping daemons, clearing mason cache, clearingbrowser cache and the like to figure all this out.

The one detail about our setup that I suspect plays a part here is thata previous admin wrote a series of email handling scripts that alwaysre-writes the sender address before handing the email off tort-mailgate. We suspected at one point that part of the issue wasrelated to the query that looks up other tickets created by the sender.An 'explain' in MySQL did show that the volume of data was forcing an ondisk temp table and filesort, but I haven't directly correlated a slowDB operation to the consumption of memory on the apache side. Thatemail handling apparatus is currently being replaced, but does havingthe same 'Creator' on 100K + tickets sound like a really bad thing, oris this normal for large shops?

Can I get some feedback on how our system compares to others using RT?How many tickets do you collect in a day? What rough system specs areyou running on? Is this normal for large volumes of tickets? Is theonly answer ever more RAM?

Also, I updated the test rig to 3.8.7 from 3.8.2 today including all DBupgrade operations with no change.

Any assistance would be very much appreciated. Our current game plan isto build an archiving system to keep the queue numbers down, but at somepoint large queues may be the norm for us. Thanks.


-Nick Geron

Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

[rt-users] Large Queue size problems

Reply via email to