A small report "from the trenches" after we've had a visit from someone doing a 
directory-attack on one of our servers, and the effect this had.


I have a few MRTG graphs to show, but they are located at an image-host (no 
nasties - just a direct link to the images):

*          last 2 hours showing the effect after "fixing" the problem at ~11:55 
- http://billedhost.dk/filer/1221728072cpu_load_last_2_hours.png 

*          last 24 hours showing the effect of the problem - 

*          last week showing what the normal load usually is (with a few spikes 
here and there) and what the directory-attack did - 


The history:


At around 17:49 Sunday evening someone began a directory-attack on one of our 
servers. The attack was logged like this in Apache's httpd-access.log:


aadi - - [14/Sep/2008:17:49:33 +0200] "GET /~aadi/ HTTP/1.1" 404 
204 "-" "-"      "-" "-" "-"     "-"

aaliyah - - [14/Sep/2008:17:49:33 +0200] "GET /~aaliyah/ 
HTTP/1.1" 404 207 "-" "-"        "-" "-" "-"     "-"

aaralyn - - [14/Sep/2008:17:49:34 +0200] "GET /~aaralyn/ 
HTTP/1.1" 404 207 "-" "-"        "-" "-" "-"     "-"

aaron - - [14/Sep/2008:17:49:34 +0200] "GET /~aaron/ HTTP/1.1" 
404 205 "-" "-"    "-" "-" "-"     "-"

abba - - [14/Sep/2008:17:49:34 +0200] "GET /~abba/ HTTP/1.1" 404 
204 "-" "-"      "-" "-" "-"     "-"

abbie - - [14/Sep/2008:17:49:34 +0200] "GET /~abbie/ HTTP/1.1" 
404 205 "-" "-"    "-" "-" "-"     "-"


(To help understand the fields, this is the logformat we use: "%V %h %l %u %t 
\"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \ \"%{Via}i\" \"%{Pragma}i\" 
\"%{X-Forwarded-For}i\" \ \"%{Cache-Control}i\"")


In total there was ~14.500 such tries spread equally across 5 vhosts - each one 
running their own Resin-instance. On average the ~2900 hits pr. vhost is really 
nothing, and would have gone unnoticed if the CPU load hadn't begun to rise 
above normal on Monday morning (anything above 1 for an extended period is 
abnormal for this server).


So Monday I began looking at what happened on the server. The individual Resin 
processes were normal - stacktraces of the running VM's showed nothing, and 
stopping/starting them did next to nothing (this is a production-server, so 
there was percious little I could try without disrupting services for several 
thousand users).


The interesting thing was that, according to the OS (RHEL5.1 x64), it was 
Apache who was responsible for the high load - not Resin. Stopping the main 
Resin process did lower the load, but as soon as it was started again the load 
rose back to the abormal level - and since we hadn't deployed a new version for 
a few weeks (and the stacktraces showed nothing of interest) we ruled out our 
own code as the culprit.


I began to look elsewhere.. ran a few rootkit-dectors against a know list of 
sha1sum's in an attempt to see if (and hopefully not!) there were anything 
rotten, but nothing.


Then I began to wonder - even though Apache's /server-status showed a rather 
normal load and nothing extraordinary (except 180-190% CPU usage), it must be 
possible to see what the individual Apache processes were doing.


I attached to a process with $ strace -p <pid> and began to look at the system 
calls, and I soon began to wonder about a lot of these calls:


open("/tmp/resintmp-DmLgMw", O_RDWR|O_CREAT|O_EXCL, 0600) = 44

write(44, "H\0\16check-intervalS\0\0015H\0\6cookieS\0"..., 16379) = 16379


write(44, "last-updateS\0\n1073741823h\0\0c\0\0e\0"..., 1818) = 1818

close(44)                               = 0

rename("/tmp/resintmp-DmLgMw", "/tmp/localhost_6856") = 0

unlink("/tmp/resintmp-DmLgMw")          = -1 ENOENT (No such file or directory)

stat("/tmp/localhost_6856", {st_mode=S_IFREG|0600, st_size=198379, ...}) = 0


open("/tmp/resintmp-K8ylXW", O_RDWR|O_CREAT|O_EXCL, 0600) = 44

write(44, "H\0\16check-intervalS\0\0015H\0\6cookieS\0"..., 16379) = 16379


write(44, "last-updateS\0\n1073741823h\0\0c\0\0e\0"..., 1818) = 1818

close(44)                               = 0

rename("/tmp/resintmp-K8ylXW", "/tmp/localhost_6856") = 0

unlink("/tmp/resintmp-K8ylXW")          = -1 ENOENT (No such file or directory)


Then I remember seeing a post about "localhost_<srun port>" files on the 
mailinglist from Vlad Artamonov (03 Aug 2008) and Scott Ferguson's reply on the 


Oh dear - I had some large localhost_<srun port> files:


-rw------- 1 apache apache 198379 Sep 16 11:22 /tmp/localhost_6856

-rw------- 1 apache apache 176417 Sep 16 11:22 /tmp/localhost_6862

-rw------- 1 apache apache       766 Sep 16 11:26 /tmp/localhost_6873

-rw------- 1 apache apache 152038 Sep 16 11:20 /tmp/localhost_6880

-rw------- 1 apache apache 139985 Sep 16 09:25 /tmp/localhost_6893

-rw------- 1 apache apache 140689 Sep 16 11:21 /tmp/localhost_6897


(The smaller one (localhost_6873) is a site that's only available from specific 
IP's in the firewall and was never attacked, so I took that as a "normal" size)


Looking at the contents of them (via less and strings) I could see a lot what 
looked like leftover garbage from the directory attack we experienced Sunday. 


I then stopped Apache, removed the files and restarted Apache, and as can be 
seen on the "last 2 hours" graph this immediately lowered the load (I removed 
the files around 11:55), and my problem vanished!


This left me wondering. 


- Was there anything I could or should have done earlier to find the error 
(except to trace an Apache process as I did)?

- Was Resin's mod_caucho (from the pro version 3.0.24) behaviour as expected - 
ie. it kept a really large cache-file updated on every request and persisted 
over restartes of both Apache and Resin?

- Can Resin detect when performance issues arrise due to the large size and 
possible do something?

- Can I somehow configure how often this file is updated - the documentation on 
the <dependency-check-interval> tag Scott mentioned doesn't mention the effect 
on the localhost_<srun port> files?

- Shouldn't these files be reset or removed when a VM is shut down or started 
to ensure optimal performance?

- Is the default behaviour different in Resin 3.1.x?

- Is there a bug somewhere or somehow? 

- Did I do the "right thing" to remove the files, or should I have done 
something else entirely?



Jens Dueholm Christensen 
Rambøll Survey IT

resin-interest mailing list

Reply via email to