Hi Jens,

I can't address the most of questions you stated, but like an IT person of 
organization using Resin with Apache fronted and having the problem with busy 
Apache environments like you  I can add the following comments.

"Was there anything I could or should have done earlier to find the error 
(except to trace an Apache process as I did)?"

Since we have already localized the problem coming to localhost_<srun_port> we 
are now monitoring its size and if it becomes bigger more than 80KB, we remove 
it with Apache and Resin stopped which improves the situation. This is 
considered like a temporary solution while we are investigating the impact of 
dependency-check internal increase and consider other solutions to this issue. 

"Is the default behaviour different in Resin 3.1.x?"

Yes, for all pre 3.1.7a versions. Unfortunately, mod_caucho does not pass 
requests to Resin in 3.1.7 for me at all with working configuration file from 
previous versions of Resin 3.1. So I can say nothing about the performance of 
this version.

"Did I do the “right thing” to remove the files, or should I have done 
something else entirely?"

I can call it the only thing that helps us with these caches and we have no 
other drawbacks from the removals except downtimes while apache and resin 
restart. If developers could recommend a more better they to get this fixed, 
I'll be glad to follow it.

Best Regards,
Vlad Artamonov
> 
> Hi
> 
> 
> 
> 
> 
> 
> 
> A small report “from the trenches” after
> we’ve had a visit from someone doing a directory-attack on one of our
> servers, and the effect this had.
> 
> 
> 
> 
> 
> 
> 
> I have a few MRTG graphs to show, but they are located at
> an image-host (no nasties – just a direct link to the images):
> 
> 
> 
> ·         
> last 2 hours showing the effect
> after “fixing” the problem at ~11:55 - 
> http://billedhost.dk/filer/1221728072cpu_load_last_2_hours.png
> 
> 
> 
> ·         
> last 24 hours showing the effect of
> the problem - http://www.billedhost.dk/filer/1221728228cpu_load_last_day.png
> 
> 
> 
> ·         
> last week showing what the normal
> load usually is (with a few spikes here and there) and what the
> directory-attack did - 
> http://www.billedhost.dk/filer/1221728247cpu_load_last_week.png
> 
> 
> 
> 
> 
> 
> 
> The history:
> 
> 
> 
> 
> 
> 
> 
> At around 17:49 Sunday evening someone began a
> directory-attack on one of our servers. The attack was logged like this in
> Apache’s httpd-access.log:
> 
> 
> 
> 
> 
> 
> 
> aadi 24.249.18.233 - - [14/Sep/2008:17:49:33 +0200]
> "GET /~aadi/ HTTP/1.1" 404 204 "-"
> "-"      "-" "-"
> "-"     "-"
> 
> 
> 
> aaliyah 24.249.18.233 - - [14/Sep/2008:17:49:33 +0200]
> "GET /~aaliyah/ HTTP/1.1" 404 207 "-"
> "-"        "-"
> "-" "-"     "-"
> 
> 
> 
> aaralyn 24.249.18.233 - - [14/Sep/2008:17:49:34 +0200]
> "GET /~aaralyn/ HTTP/1.1" 404 207 "-"
> "-"        "-"
> "-" "-"     "-"
> 
> 
> 
> aaron 24.249.18.233 - - [14/Sep/2008:17:49:34 +0200]
> "GET /~aaron/ HTTP/1.1" 404 205 "-"
> "-"    "-" "-"
> "-"     "-"
> 
> 
> 
> abba 24.249.18.233 - - [14/Sep/2008:17:49:34 +0200]
> "GET /~abba/ HTTP/1.1" 404 204 "-"
> "-"      "-" "-"
> "-"     "-"
> 
> 
> 
> abbie 24.249.18.233 - - [14/Sep/2008:17:49:34 +0200]
> "GET /~abbie/ HTTP/1.1" 404 205 "-"
> "-"    "-" "-"
> "-"     "-"
> 
> 
> 
> 
> 
> 
> 
> (To help understand the fields, this is the logformat we
> use: "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-agent}i\" \ \"%{Via}i\" \"%{Pragma}i\"
> \"%{X-Forwarded-For}i\" \ \"%{Cache-Control}i\"")
> 
> 
> 
> 
> 
> 
> 
> In total there was ~14.500 such tries spread equally
> across 5 vhosts – each one running their own Resin-instance. On average
> the ~2900 hits pr. vhost is really nothing, and would have gone unnoticed if
> the CPU load hadn’t begun to rise above normal on Monday morning
> (anything above 1 for an extended period is abnormal for this server).
> 
> 
> 
> 
> 
> 
> 
> So Monday I began looking at what happened on the server.
> The individual Resin processes were normal – stacktraces of the running
> VM’s showed nothing, and stopping/starting them did next to nothing (this
> is a production-server, so there was percious little I could try without
> disrupting services for several thousand users).
> 
> 
> 
> 
> 
> 
> 
> The interesting thing was that, according to the OS
> (RHEL5.1 x64), it was Apache who was responsible for the high load – not
> Resin. Stopping the main Resin process did lower the load, but as soon as it
> was started again the load rose back to the abormal level – and since we
> hadn’t deployed a new version for a few weeks (and the stacktraces showed
> nothing of interest) we ruled out our own code as the culprit.
> 
> 
> 
> 
> 
> 
> 
> I began to look elsewhere.. ran a few rootkit-dectors
> against a know list of sha1sum’s in an attempt to see if (and hopefully
> not!) there were anything rotten, but nothing.
> 
> 
> 
> 
> 
> 
> 
> Then I began to wonder – even though Apache’s
> /server-status showed a rather normal load and nothing extraordinary (except
> 180-190% CPU usage), it must be possible to see what the individual Apache
> processes were doing.
> 
> 
> 
> 
> 
> 
> 
> I attached to a process with $ strace –p <pid>
> and began to look at the system calls, and I soon began to wonder about a lot
> of these calls:
> 
> 
> 
> 
> 
> 
> 
> open("/tmp/resintmp-DmLgMw",
> O_RDWR|O_CREAT|O_EXCL, 0600) = 44
> 
> 
> 
> write(44,
> "H\0\16check-intervalS\0\0015H\0\6cookieS\0"..., 16379) = 16379
> 
> 
> 
> …
> 
> 
> 
> write(44,
> "last-updateS\0\n1073741823h\0\0c\0\0e\0"..., 1818) = 1818
> 
> 
> 
> close(44)                              
> = 0
> 
> 
> 
> rename("/tmp/resintmp-DmLgMw",
> "/tmp/localhost_6856") = 0
> 
> 
> 
> unlink("/tmp/resintmp-DmLgMw")         
> = -1 ENOENT (No such file or directory)
> 
> 
> 
> stat("/tmp/localhost_6856",
> {st_mode=S_IFREG|0600, st_size=198379, ...}) = 0
> 
> 
> 
> …
> 
> 
> 
> open("/tmp/resintmp-K8ylXW",
> O_RDWR|O_CREAT|O_EXCL, 0600) = 44
> 
> 
> 
> write(44,
> "H\0\16check-intervalS\0\0015H\0\6cookieS\0"..., 16379) = 16379
> 
> 
> 
> …
> 
> 
> 
> write(44,
> "last-updateS\0\n1073741823h\0\0c\0\0e\0"..., 1818) = 1818
> 
> 
> 
> close(44)                              
> = 0
> 
> 
> 
> rename("/tmp/resintmp-K8ylXW",
> "/tmp/localhost_6856") = 0
> 
> 
> 
> unlink("/tmp/resintmp-K8ylXW")         
> = -1 ENOENT (No such file or directory)
> 
> 
> 
> 
> 
> 
> 
> Then I remember seeing a post about
> “localhost_<srun port>” files on the mailinglist from Vlad
> Artamonov (03 Aug 2008) and Scott Ferguson’s reply on the 4th.
> 
> 
> 
> 
> 
> 
> 
> Oh dear – I had some large localhost_<srun
> port> files:
> 
> 
> 
> 
> 
> 
> 
> -rw------- 1 apache apache 198379 Sep 16 11:22
> /tmp/localhost_6856
> 
> 
> 
> -rw------- 1 apache apache 176417 Sep 16 11:22
> /tmp/localhost_6862
> 
> 
> 
> -rw------- 1 apache apache 
>      766 Sep 16 11:26 /tmp/localhost_6873
> 
> 
> 
> -rw------- 1 apache apache 152038 Sep 16 11:20
> /tmp/localhost_6880
> 
> 
> 
> -rw------- 1 apache apache 139985 Sep 16 09:25
> /tmp/localhost_6893
> 
> 
> 
> -rw------- 1 apache apache 140689 Sep 16 11:21
> /tmp/localhost_6897
> 
> 
> 
> 
> 
> 
> 
> (The smaller one (localhost_6873) is a site that’s
> only available from specific IP’s in the firewall and was never attacked,
> so I took that as a “normal” size)
> 
> 
> 
> 
> 
> 
> 
> Looking at the contents of them (via less and strings) I could
> see a lot what looked like leftover garbage from the directory attack we
> experienced Sunday.
> 
> 
> 
> 
> 
> 
> 
> I then stopped Apache, removed the files and restarted
> Apache, and as can be seen on the “last 2 hours” graph this
> immediately lowered the load (I removed the files around 11:55), and my 
> problem
> vanished!
> 
> 
> 
> 
> 
> 
> 
> This left me wondering.
> 
> 
> 
> 
> 
> 
> 
> - Was there anything I could or should have done earlier
> to find the error (except to trace an Apache process as I did)?
> 
> 
> 
> - Was Resin’s mod_caucho (from the pro version
> 3.0.24) behaviour as expected – ie. it kept a really large cache-file
> updated on every request and persisted over restartes of both Apache and 
> Resin?
> 
> 
> 
> - Can Resin detect when performance issues arrise due to
> the large size and possible do something?
> 
> 
> 
> - Can I somehow configure how often
> this file is updated – the documentation on the
> <dependency-check-interval> tag Scott mentioned doesn’t mention the
> effect on the localhost_<srun port> files?
> 
> 
> 
> - Shouldn’t these files be
> reset or removed when a VM is shut down or started to ensure optimal
> performance?
> 
> 
> 
> - Is the default behaviour
> different in Resin 3.1.x?
> 
> 
> 
> - Is there a bug somewhere or
> somehow?
> 
> 
> 
> - Did I do the “right
> thing” to remove the files, or should I have done something else
> entirely?
> 
> 
> 
> 
> 
> Regards,
> 
> 
> 
> Jens Dueholm Christensen 
> Rambøll Survey IT
> 
> _______________________________________________
> resin-interest mailing list
> resin-interest@caucho.com
> http://maillist.caucho.com/mailman/listinfo/resin-interest
> 
> 

-- 
Яндекс.Открытки на все случаи жизни http://cards.yandex.ru/


_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

Reply via email to