Re: [CentOS] Anyway to ensure SSH availability?

2011-07-01 Thread Emmanuel Noobadmin
On 7/1/11, Les Mikesell  wrote:

> The principle is the same but the way to control it would be different.
>   Spamassassin is a perl program that uses a lot of memory and takes a
> lot of resources to start up.  If you run a lot of copies at once,
> expect the machine to crawl or die.

This I had experienced before, which is why the first thing I look at
usually is the mail processes.

>MimeDefang, being mostly perl
> itself, runs spamassassin in its own process and has a way to control
> the number of instances - and does it in a way that doesn't tie a big
> perl process to every sendmail instance.  Other systems might run the
> spamd background process and queue up the messages to scan.   The worst
> case is something that starts a new process for every received message
> and keeps the big perl/spamassassin process running for the duration -
> you might also see this with spamassassin runs happening in each user's
> .procmailrc.  One thing that might help is to make sure the spam/virus
> check operations happen in an order that starts with the least resource
> usage and the most likely checks to cause rejection so spamassassin
> might not have to run so much.

I do have greylisting and stuff in to reject as much mail before spamd
runs, so there's probably not much more I could do on that side
without learning to program Exim conf.


> The same principle applies there, especially if you have big cgi
> programs or mod_perl, mod_python, mod_php (etc.) modules that use a lot
> of resources.  You are probably running in pre-forking mode so those
> programs quickly stop sharing memory in the child processes (perl is
> particularly bad about this since variable reference counts are always
> being updated).  Even if you handle normal load, you might have a
> problem when a search engine indexer walks your links and fires off more
> copies than usual.  You can get an idea of how much of a problem you
> have here by looking at the RES size of the httpd processes in top. If
> they are big and fairly variable, you have some pages/modules/programs
> that consume a lot of memory.   You can limit the number of concurrent
> processes, and in some cases it might help to reduce their life
> (MaxRequestsPerChild).

I'll keep this in mind if the current fix doesn't hold up (no
ballooning, higher starting memory for the VM) which it appears to so
far.

> Oh, one other thing... Do the web programs using mysql for anything?
> I've seen mysql do some really dumb things on a 3-table join, like make
> a temporary table containing all the join possibilities, sort it, then
> return the small number of rows you asked for with a LIMIT.  Maybe it is
> better these days but that used to happen even when there were indexes
> on the fields involved and if any of the tables were big it would take a
> huge amount of disk activity.

Most of the apps run off mysql, the likely culprit could be the
Wordpress corporate blog they have since that probably invites all
kind of spambots and what not. Definitely not our customized apps
since we basically have an audit trail of every single command issued
to the system and so although I don't have the relevant httpd logs due
to the logrotate error, I'm certain no cron jobs and nobody was
accessing it at those times.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread John Hinton
On 6/30/2011 4:53 PM, Robert Heller wrote:
>
>> Right now it doesn't look like an mail run, more like a httpd run
>> because it's starting to look like a large number of httpd threads was
>> spawned just before that.
> OK, there are probably settings for Apache to run fewer threads.
> Probably better have a "Server too busy" type of message than a wedged
> server.  (And most likely the extra httpd threads will just be spambots
> of some sort anyway -- who cares if they get tossed...)
>
With the launch of Living Social, we have had a few clients use that 
service and you will suddenly have all Apache instances running and the 
server acting very laggy to all but unresponsive. I have cut back on the 
total number of Apache instances due to these 'non-attacks' which are 
much like a DoS attack. It seems the first day is horrid, the second not 
so bad and it wains down from there.

This really raises a new question of what to do the handle such 
broadcast ads? We run very conservative server loads, but...

I don't recommend running it all the time, only when you need to catch 
something, but server status can be your friend. You can run a refresh 
in your browser... leave it running in a tab set to refresh like once 
every minute or five. It will show the instances of Apache and the files 
being accessed. Much faster than digging through logs in a Virt server 
environment. This feature is built into Apache, but is not on by 
default. Look at your httpd.conf file.

-- 
John Hinton
877-777-1407 ext 502
http://www.ew3d.com
Comprehensive Online Solutions

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Robert Heller
At Fri, 1 Jul 2011 01:39:19 +0800 CentOS mailing list  wrote:

> 
> On 6/30/11, Les Mikesell  wrote:
> > Again, fixable by not sharing the disk the guest uses with the disk the host
> > needs to load programs from... The disk head is always going to be in the
> > wrong place.
> 
> Well, let's just say my original recommendation specifications for
> this particular set was a HP ML110G6 with 8GB, 2x 250GB for the host
> and 2x1.5TB storage drives, told them the extra memory and drives
> could be bought OTS so they don't have to pay HP prices for that.
> 
> What I end up working with is a no-brand desktop quad core with a pair
> of 500GB... so the chances of convincing them to fork out extra for
> hardware isn't good. Unfortunately I'm stuck with making things work
> because managing the server was part of the contract sold with the
> apps.
> 
> 
> > But, odds are that the source of the problem is starting too many mail
> > delivery programs, especially if they, or the user's local procmail, starts 
> > a
> > spamassassin instance per message.  Look at the mail logs for a problem time
> > to see if you had a flurry of messages coming in.  Sendmail/MimeDefang is
> > fairly good at queuing the input and controlling the processes running at 
> > once but
> > even  with that you may have to throttle the concurrent sendmail processes.
> 
> Does it make a difference if I'm running Exim instead of sendmail/MimeDefang?

Probably not.  I suspect that Exim also has a throttling parameter setting.

> Right now it doesn't look like an mail run, more like a httpd run
> because it's starting to look like a large number of httpd threads was
> spawned just before that.

OK, there are probably settings for Apache to run fewer threads. 
Probably better have a "Server too busy" type of message than a wedged
server.  (And most likely the extra httpd threads will just be spambots
of some sort anyway -- who cares if they get tossed...)

> 
> Unfortunately, I also discovered that logrotate was wrongly configured
> and I only have daily logs. Fixed that, hopefully, and shall see if I
> get something better to work on if it strikes again.
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
>   

-- 
Robert Heller -- 978-544-6933 / hel...@deepsoft.com
Deepwoods Software-- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


   
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Les Mikesell
On 6/30/2011 12:39 PM, Emmanuel Noobadmin wrote:
>
> Right now it doesn't look like an mail run, more like a httpd run
> because it's starting to look like a large number of httpd threads was
> spawned just before that.

Oh, one other thing... Do the web programs using mysql for anything? 
I've seen mysql do some really dumb things on a 3-table join, like make 
a temporary table containing all the join possibilities, sort it, then 
return the small number of rows you asked for with a LIMIT.  Maybe it is 
better these days but that used to happen even when there were indexes 
on the fields involved and if any of the tables were big it would take a 
huge amount of disk activity.

-- 
   Les Mikesell
lesmikes...@gmail.com

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Les Mikesell
On 6/30/2011 12:39 PM, Emmanuel Noobadmin wrote:
>
>> But, odds are that the source of the problem is starting too many mail
>> delivery programs, especially if they, or the user's local procmail, starts a
>> spamassassin instance per message.  Look at the mail logs for a problem time
>> to see if you had a flurry of messages coming in.  Sendmail/MimeDefang is
>> fairly good at queuing the input and controlling the processes running at 
>> once but
>> even  with that you may have to throttle the concurrent sendmail processes.
>
> Does it make a difference if I'm running Exim instead of sendmail/MimeDefang?

The principle is the same but the way to control it would be different. 
  Spamassassin is a perl program that uses a lot of memory and takes a 
lot of resources to start up.  If you run a lot of copies at once, 
expect the machine to crawl or die.  MimeDefang, being mostly perl 
itself, runs spamassassin in its own process and has a way to control 
the number of instances - and does it in a way that doesn't tie a big 
perl process to every sendmail instance.  Other systems might run the 
spamd background process and queue up the messages to scan.   The worst 
case is something that starts a new process for every received message 
and keeps the big perl/spamassassin process running for the duration - 
you might also see this with spamassassin runs happening in each user's 
.procmailrc.  One thing that might help is to make sure the spam/virus 
check operations happen in an order that starts with the least resource 
usage and the most likely checks to cause rejection so spamassassin 
might not have to run so much.

> Right now it doesn't look like an mail run, more like a httpd run
> because it's starting to look like a large number of httpd threads was
> spawned just before that.

The same principle applies there, especially if you have big cgi 
programs or mod_perl, mod_python, mod_php (etc.) modules that use a lot 
of resources.  You are probably running in pre-forking mode so those 
programs quickly stop sharing memory in the child processes (perl is 
particularly bad about this since variable reference counts are always 
being updated).  Even if you handle normal load, you might have a 
problem when a search engine indexer walks your links and fires off more 
copies than usual.  You can get an idea of how much of a problem you 
have here by looking at the RES size of the httpd processes in top. If 
they are big and fairly variable, you have some pages/modules/programs 
that consume a lot of memory.   You can limit the number of concurrent 
processes, and in some cases it might help to reduce their life 
(MaxRequestsPerChild).


-- 
   Les Mikesell
lesmikes...@gmail.com


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Emmanuel Noobadmin
On 6/30/11, Les Mikesell  wrote:
> Again, fixable by not sharing the disk the guest uses with the disk the host
> needs to load programs from... The disk head is always going to be in the
> wrong place.

Well, let's just say my original recommendation specifications for
this particular set was a HP ML110G6 with 8GB, 2x 250GB for the host
and 2x1.5TB storage drives, told them the extra memory and drives
could be bought OTS so they don't have to pay HP prices for that.

What I end up working with is a no-brand desktop quad core with a pair
of 500GB... so the chances of convincing them to fork out extra for
hardware isn't good. Unfortunately I'm stuck with making things work
because managing the server was part of the contract sold with the
apps.


> But, odds are that the source of the problem is starting too many mail
> delivery programs, especially if they, or the user's local procmail, starts a
> spamassassin instance per message.  Look at the mail logs for a problem time
> to see if you had a flurry of messages coming in.  Sendmail/MimeDefang is
> fairly good at queuing the input and controlling the processes running at 
> once but
> even  with that you may have to throttle the concurrent sendmail processes.

Does it make a difference if I'm running Exim instead of sendmail/MimeDefang?
Right now it doesn't look like an mail run, more like a httpd run
because it's starting to look like a large number of httpd threads was
spawned just before that.

Unfortunately, I also discovered that logrotate was wrongly configured
and I only have daily logs. Fixed that, hopefully, and shall see if I
get something better to work on if it strikes again.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Giovanni Tirloni
On Thu, Jun 30, 2011 at 4:38 AM, Alexander Dalloz wrote:

> Am 30.06.2011 08:36, schrieb Steve Barnes:
> >> Although it would really be interesting to me to see scheduler settings
> that would indeed allow something of a 'privileged' ssh or an OOB console
> that would be responsive even under a punishing load with lots of swapping,
> which is what the OP originally asked about.
> >
> > I'd be interested to hear thoughts on this. We have a small 1U test
> server with 2 entry-level SATA drives that was brought to its knees twice
> this week by an overzealous Java process. Load averages were up around 60+
> and as a result, SSH access would timeout. I don't know if this behaviour is
> typical across operating systems, but it's frustrating to find yourself
> locked out a server just because a single process went to town on the i/o
> subsystem.
> >
> > Cheers
> >
> > Steve
>
> CentOS 6 will support cgroups, by which you can control cpu, memory and
> I/O.
>
> http://www.mjmwired.net/kernel/Documentation/cgroups.txt
>
> http://www.mjmwired.net/kernel/Documentation/cgroups/blkio-controller.txt


Just tried the disktop.stp script on a Linux 2.6.38 and it looks nice. The
possibilities! :)

http://sourceware.org/systemtap/examples/io/disktop.stp

-- 
Giovanni Tirloni
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread John Hodrien
On Thu, 30 Jun 2011, rai...@ultra-secure.de wrote:

>> Steve Barnes wrote:
>
> [...]
>
>> Or maybe having that core root tree on separate HDD and separate HDD
>> controller.
>>
>
>
> Unfortunately, all this does not matter at all.
> The problem is: sshd is swapped out and the system needs to swap-out
> something else first, before it can take sshd back in.

Reduce your chances of it being kicked out into swap as a result of i/o:

sysctl -q vm.swappiness=0

If that improves things, add an appropriate line into sysctl.conf.

jh
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Steven Tardy
On 06/29/11 14:50, Emmanuel Noobadmin wrote:
> I was having problems with the same server locking up to the point I
> can't even get in via SSH.

investigate instead of band-aiding...

1) syslog to a remote host.
remote syslogging rarely stops when the system is disk/iowait bound.

2) log diving.
is there anything in the logs around the time of the incidents?

large emails(100MB+ body, not attachment) can freak out versions of spam 
assassin...
  server load would reach 300+ which timed out SSH connections.
syslogs took time to wade through, but pinpointed the recurring issue.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Les Mikesell
On 6/30/11 6:11 AM, Emmanuel Noobadmin wrote:
> On 6/30/11, Simon Matter  wrote:
>> Hm, I thought the problem was I/O, not memory? If memory is not the
>> problem then it has nothing to do with swapping (more correctly paging).
>
> After looking through the various replies here and rechecking whatever
> logs I managed to get, it might in a way be related to swapping, not
> on the host which I am trying to get into but the guest.

Again, fixable by not sharing the disk the guest uses with the disk the host 
needs to load programs from... The disk head is always going to be in the wrong 
place.

But, odds are that the source of the problem is starting too many mail delivery 
programs, especially if they, or the user's local procmail, starts a 
spamassassin instance per message.  Look at the mail logs for a problem time to 
see if you had a flurry of messages coming in.  Sendmail/MimeDefang is fairly 
good at queuing the input and controlling the processes running at once but 
even 
with that you may have to throttle the concurrent sendmail processes.

-- 
   Les Mikesell
 lesmikes...@gmail.com


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Emmanuel Noobadmin
On 6/30/11, Simon Matter  wrote:
> Hm, I thought the problem was I/O, not memory? If memory is not the
> problem then it has nothing to do with swapping (more correctly paging).

After looking through the various replies here and rechecking whatever
logs I managed to get, it might in a way be related to swapping, not
on the host which I am trying to get into but the guest.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Emmanuel Noobadmin
On 6/30/11, rai...@ultra-secure.de  wrote:
> Unfortunately, all this does not matter at all.
> The problem is: sshd is swapped out and the system needs to swap-out
> something else first, before it can take sshd back in.

There appears to be some functions available to programs to lock their
process pages in memory, mlock and mlockall. But I can't seem to find
a command line equivalent that might be able to keep sshd locked into
memory.

In any case, I've ionice and renice sshd and see if that would help.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Emmanuel Noobadmin
On 6/30/11, Devin Reade  wrote:
> I don't recall you mentioning which VM solution you're using.

KVM :)

> Some problematic areas that I've seen when using VMs:
>
> + memory ballooning sometimes causes problems (I've not actually seen
>   it, but I've seen various warnings about having it enabled and
>   resultant flakiness, and I run with it disabled)

This might be one of the problems, because I just realized while the
swap used is still pretty small at around 200MB, it's about 5x the
"normal" amount of about 40MB. But since I set an initial 1GB with an
upper limit of 1.5GB, I'll expect the amount of memory available to be
1.5GB at least when swap usage goes up. However, this isn't the case,
the ballooning doesn't seem to be happening so maybe that's part of
the problem: one of them just wanted to use a bit more memory for
whatsoever reasons but didn't get it and start hitting swap and the
i/o starts going crazy.


> + I/O stacks not doing TCP segment offload correctly.  This is an ugly
>   one that bit me hard and took a while to track down.  It's happened in
>   both ESXi and Xen (and I'm not saying that KVM isn't affected, either).
>
>   The symptoms of this is things seem to be fine under low load, but as
>   network traffic starts to increase TCP sessions start stalling out
>   or dying.  I've seen it to the point where I can't even maintain an
>   ssh session long enough to get a login prompt.

This might be possible but at the moment I'll consider it unlikely
since the problem don't usually happen during low load periods i.e.
not when the users are connecting to the email or app service during
working hours.

So I'll KIV this first and see if simply setting the max/current
memory without relying on ballooning works.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Simon Matter
>> Steve Barnes wrote:
>
> [...]
>
>> Or maybe having that core root tree on separate HDD and separate HDD
>> controller.
>>
>
>
> Unfortunately, all this does not matter at all.
> The problem is: sshd is swapped out and the system needs to swap-out

Hm, I thought the problem was I/O, not memory? If memory is not the
problem then it has nothing to do with swapping (more correctly paging).

Simon

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Eero Volotinen
2011/6/30  :
>> Steve Barnes wrote:
>
> [...]
>
>> Or maybe having that core root tree on separate HDD and separate HDD
>> controller.
>>
>
>
> Unfortunately, all this does not matter at all.
> The problem is: sshd is swapped out and the system needs to swap-out
> something else first, before it can take sshd back in.

How about buying more memory and faster harddisks?

--
Eero
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread rainer
> Steve Barnes wrote:

[...]

> Or maybe having that core root tree on separate HDD and separate HDD
> controller.
>


Unfortunately, all this does not matter at all.
The problem is: sshd is swapped out and the system needs to swap-out
something else first, before it can take sshd back in.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Ljubomir Ljubojevic
Steve Barnes wrote:
> I'd be interested to hear thoughts on this. We have a small 1U test server 
> with 2 entry-level SATA drives that was brought to its knees twice this week 
> by an overzealous Java process. Load averages were up around 60+ and as a 
> result, SSH access would timeout. I don't know if this behaviour is typical 
> across operating systems, but it's frustrating to find yourself locked out a 
> server just because a single process went to town on the i/o subsystem.
> 
that privileged SSH process should be spawned on RAM disk to avoid being 
caught up by I/O problems.

Another solution would be to have core root directory (no var/log's and 
similar) in RAM disk (either as cache or duplicated files) so important 
processes are independent from I/O problems.

Or maybe having that core root tree on separate HDD and separate HDD 
controller.

Ljubomir
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Ljubomir Ljubojevic
Robert Heller wrote:
> If the machine is a public-facing smtp server, I would look first to see
> if you are getting the problem I was having.  Maybe looking at the
> maillog to see if the volume of incoming mail is just overwhelming the
> system. In which case you need to do things to keep sendmail from
> running to many processes, either by throttling the connection rate and/or
> be using the accessdb to discard or reject connection from known problem
> networks. 
> 

Very simple solution is to implement Reverse DNS check. My Postfix mail 
server refuses to accept any mail from FQDN without valid reverse DNS. I 
(was?) also use graylisting and few other measures, but Reverse DNS 
helped immensely in lowering SPAM that comes to my mailboxes. I would 
say that reduction is some 70-80%.

Ljubomir
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-30 Thread Alexander Dalloz
Am 30.06.2011 08:36, schrieb Steve Barnes:
>> Although it would really be interesting to me to see scheduler settings that 
>> would indeed allow something of a 'privileged' ssh or an OOB console that 
>> would be responsive even under a punishing load with lots of swapping, which 
>> is what the OP originally asked about.
> 
> I'd be interested to hear thoughts on this. We have a small 1U test server 
> with 2 entry-level SATA drives that was brought to its knees twice this week 
> by an overzealous Java process. Load averages were up around 60+ and as a 
> result, SSH access would timeout. I don't know if this behaviour is typical 
> across operating systems, but it's frustrating to find yourself locked out a 
> server just because a single process went to town on the i/o subsystem.
> 
> Cheers
> 
> Steve

CentOS 6 will support cgroups, by which you can control cpu, memory and I/O.

http://www.mjmwired.net/kernel/Documentation/cgroups.txt

http://www.mjmwired.net/kernel/Documentation/cgroups/blkio-controller.txt

Alexander
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Steve Barnes
> Although it would really be interesting to me to see scheduler settings that 
> would indeed allow something of a 'privileged' ssh or an OOB console that 
> would be responsive even under a punishing load with lots of swapping, which 
> is what the OP originally asked about.

I should add, we have OOB management facilities available but even the console 
login was unresponsive. The one SSH login that was logged in at the time the 
trouble started wasn't capable of terminating any of the problematic processses 
or issuing a graceful reboot sequence. Pressing the power button and/or 
Ctrl+Alt+Delete would return "Shutdown: already in progress" (which I 
eventually gave up waiting for).

Cheers

Steve
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Steve Barnes
> Although it would really be interesting to me to see scheduler settings that 
> would indeed allow something of a 'privileged' ssh or an OOB console that 
> would be responsive even under a punishing load with lots of swapping, which 
> is what the OP originally asked about.

I'd be interested to hear thoughts on this. We have a small 1U test server with 
2 entry-level SATA drives that was brought to its knees twice this week by an 
overzealous Java process. Load averages were up around 60+ and as a result, SSH 
access would timeout. I don't know if this behaviour is typical across 
operating systems, but it's frustrating to find yourself locked out a server 
just because a single process went to town on the i/o subsystem.

Cheers

Steve
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Lamar Owen
On Wednesday, June 29, 2011 05:20:26 PM Rainer Duffner wrote:
> Am 29.06.2011 um 23:17 schrieb Lamar Owen:
> > More expensive servers that would be suitable for  
> > virtualization host use also tend to have better I/O subsystems and  
> > faster disks.  Relative to a 'cheap' system with much poorer base I/ 
> > O bandwidth.

> The OP clearly stated that he's probably not running a datacenter full  
> of DL580g7 servers...

Yeah, I saw that.  I was just addressing the I/O slowdown thing, where if you 
double the money you might very well get more than double the performance, and 
get two VM's running faster than on the cheaper hardware.  But it seems he's 
already doing some virt.

Just not enough detail to sort that out.  

Although it would really be interesting to me to see scheduler settings that 
would indeed allow something of a 'privileged' ssh or an OOB console that would 
be responsive even under a punishing load with lots of swapping, which is what 
the OP originally asked about.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Robert Heller
At Thu, 30 Jun 2011 05:31:05 +0800 CentOS mailing list  
wrote:

> 
> On 6/30/11, Giovanni Tirloni  wrote:
> > Linux includes I/O in how it calculates the load average so you're not
> > measuring CPU alone.
> 
> On the host, it's expected, I've got two qemu-kvm process loading up
> 100% cpu. Within the guest VM, top looks like this, high load but low
> cpu %.
> 
> top - 10:21:40 up 1 day, 59 min,  0 users,  load average: 16.72, 6.05, 2.29
> Tasks: 176 total,   1 running, 175 sleeping,   0 stopped,   0 zombie
> Cpu(s):  3.3%us,  1.2%sy,  1.2%ni, 91.2%id,  2.7%wa,  0.1%hi,  0.2%si,  0.0%st
> Mem:   1017392k total,   970564k used,46828k free, 1436k buffers
> Swap:  2040244k total,   200572k used,  1839672k free,30344k cached
> 
> > What does top show?
> > Any error messages in /var/log during the time the server is unresponsive?
> > Is network responsive? Latency normal too?
> 
> I think the network is responsive, pings work but nothing else does.
> No error messages in both host and guest. faillog, messages and dmesg
> give no clue. Which is why I figured I really need to be logged in,
> check and if necessary kill innocent processes one by one until I find
> the culprit when it's going crazy.

This looks a lot like my server looked/looks when being hit on by the
spambot(s). Lots of I/O, not much CPU.  Moving message bytes around,
etc.  What does maillog look like? There won't be errors, but how much
message traffic is there?

> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
>   
>

-- 
Robert Heller -- 978-544-6933 / hel...@deepsoft.com
Deepwoods Software-- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Robert Heller
At Thu, 30 Jun 2011 05:12:12 +0800 CentOS mailing list  
wrote:

> 
> On 6/30/11, Robert Heller  wrote:
> > If the problem is excessive load because Sendmail / Mimedefang / spamd /
> > etc. is too busy handling tons of mail/spam being dumped on your server, you
> > might want to look at these sendmail options:
> 
> Mail was my first suspect because I had similar issues with exim/spamd
> locking up bad on another server. But usually that includes a high cpu
> % as well. Although this suspicion did help me pinpoint one of the
> causes, a script that periodically went through the email
> accounts/Maildirs and that was fixed from learning about ionice on the
> list.
> 
> For a while I thought problem solved, but these couple of days, it's
> acting up again and nothing's jumping out screaming "I'm the problem!"
> and not being able to SSH to see what's exactly going on is making it
> difficult.

I have discovered that my VPS (which is a Mail and Web server), would
become impossible to ssh into sometimes.  If I was patient enoungh,
slogin would eventualy get me on the system.  Ps would show lots and
lots of sendmail, mimedefang, spamd, and clamav processes and insane
load average values.  I generally could manage to stop sendmail, and
the load average to begin to go down as the various mail related
processes wound down (once things became sane, I'd restart sendmail and
any crashed daemons).  I put in sendmail settings to throttle back on
accepting connections when things got excessively 'busy'.  This was NOT
anything running on my server, but caused but some overeager spambot
(or spambot farm), pushing a vast amount of spam at my server. This is
a 'random' event and does not seem to follow any sort of meaningful or
predictable schedule.  I guess being proactive with sendmail settings,
including the throttling setting and populating the accessdb with
DSL/Cable modem networks (DISCARD) and various other random troublesome
networks (REJECT) helps.  (The networks in the accessdb cut off lots of
connections without firing up mimedefang and crew.)  I also have the
SpamCop rule enabled as well.

If the machine is a public-facing smtp server, I would look first to see
if you are getting the problem I was having.  Maybe looking at the
maillog to see if the volume of incoming mail is just overwhelming the
system. In which case you need to do things to keep sendmail from
running to many processes, either by throttling the connection rate and/or
be using the accessdb to discard or reject connection from known problem
networks. 

> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
>

-- 
Robert Heller -- 978-544-6933 / hel...@deepsoft.com
Deepwoods Software-- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


  
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Devin Reade
--On Thursday, June 30, 2011 05:04:07 AM +0800 Emmanuel Noobadmin
 wrote:

> On 6/30/11, Les Mikesell  wrote:
>> The seriously on-the-cheap approach is to run a few virtual servers on
>> hardware slightly better than one of the individual servers would need.
> 
> Actually THAT is the fundamental problem ;)
> The physical server is frankly much more powerful than the two guest
> running on it. I have the same applications + public web/email running
> on old dual core machines with less memory than the guests.

I don't recall you mentioning which VM solution you're using.

Some problematic areas that I've seen when using VMs:

+ memory ballooning sometimes causes problems (I've not actually seen
  it, but I've seen various warnings about having it enabled and 
  resultant flakiness, and I run with it disabled)

+ I/O stacks not doing TCP segment offload correctly.  This is an ugly 
  one that bit me hard and took a while to track down.  It's happened in
  both ESXi and Xen (and I'm not saying that KVM isn't affected, either).

  The symptoms of this is things seem to be fine under low load, but as
  network traffic starts to increase TCP sessions start stalling out
  or dying.  I've seen it to the point where I can't even maintain an
  ssh session long enough to get a login prompt.

  What it comes down to is the top level (virtual) OS decides to hand off
  TCP segmentation to the (virtual) NIC.  To make a long story short,
  between the guest OS, the virtual NICs, the virtual switches, the 
  host OS, and the physical NICs, there exists a path (depending on
  versions and hardware) where everyone things somebody else is doing
  TCP segment handling, and nobody is.  So as I/O goes up or fragmentation
  occurs, the protocol goes into the toilet.  Sometimes you miss packets
  and sometimes the data is corrupt.

  Disabling tcp segment offload in both the host and guest avoids the 
  problem (forcing the OS to do it instead of the VM & physical layers).
  Be aware of reboots and update processes that want to reenable it ...

Devin

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Les Mikesell
On 6/29/2011 4:47 PM, Emmanuel Noobadmin wrote:
>
>> OK, but without knowing the cause, you already know the cure.   Make the
>> virtual servers not share physical disks - they will always want a
>> single head to be in different places at the same time.
>
> Same old problem: budget :D

If an extra SATA disk (or pair) is an issue I'd worry about whether your 
paycheck is going to bounce.  Head contention is the one thing you can't 
virtualize away, although adding a bunch of RAM can sometimes help.  And 
a virtual machine with its own raid set is still a pretty cheap machine.

> Also, I expect similar setups in the future so I need to be able to
> know why and not simply throw hardware at it since the amount of disk
> activity is relatively low. The curious part is that this doesn't
> appear to happen during expected heavy usage. It almost never occurs
> during working hours on a weekday ever-since I ionice the other
> script.

If you have sysstat installed, sar should tell you when the busy times 
occur.  Maybe you can match it up with a cron job or some email user's 
connection that might be downloading a bazillion messages.  Are you sure 
it isn't the daily build of the mlocate database running from 
cron.daily.  That's probably not pretty when running simultaneously on 
multiple vm that have lots of little files on the same physical disk.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Devin Reade
--On Thursday, June 30, 2011 04:15:07 AM +0800 Emmanuel Noobadmin
 wrote:

> Would ILO work on a server that's unresponsive due to heavy load?

ILO or any other OOB solution gives you the functionality of sitting
at the console.  So if the problem is one that would cause the console
to be unresponsive, you're still not going to be able to log in.
OTOH, if console is responsive but network-based access such as ssh
is not, then OOB may help.

As others have said, OOB is always a good idea.  When the money 
managers question the cost, compare it to the cost of servers 
becoming unavailable, someone having to travel to the site, etc.
For aftermarket stuff, in addition to what others have mentioned,
search the list archives for threads mentioning 'ipeps'.

Getting back to determining the source of the problem:

I would suggest that you might want to look at running sar.  It's
something that will collect various system statistics constantly,
so it's good for, when there's an event, going back after the fact
and showing what lead up to it.  It may not tell you the actual
problem, but the statistics can help isolate the cause.

Be aware of Heisenberg, though.  You suspect that your problem is
I/O based.  Sar is going to increase your I/O in order to log the
stats to disk.  If you have sar sampling too often, you're going
to increase the amount your problem is happening (or make it happen
faster).  If you don't sample often enough, the lack of resolution of
the data can hide what is actually going on with the system.  (Grabbing
a number out of the air, you might be able start sampling at once
per minute.)  If you can log sar stats to a different disk, it might
help.

Also be aware that when things grind to a halt, you're probably not 
going to get stats.  So what you see (after reboot) may just include
the *start* of the event.

sar is part of the sysstat rpm.

Devin

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread m . roth
Brian Mathis wrote:
> On Wed, Jun 29, 2011 at 5:22 PM,   wrote:

>> Here's another one, that I got from another admin talking to VMware:
>> watch out just how many virtual CPUs you assign to each VM. If you've
>> assigned 4, it is actually going to sit there waiting until it gets 4
>> virtual CPUs.
>> As of '09, VMware was recommending assigning 2.
>
> This is no longer true [1], but it's still a good idea to only assign
> as many CPUs as you need.
>
> [1] Source: VMware Engineer at VMware Forum 2011.

Ah, thanks! Yeah, the problem was with overcommitting. Glad to hear that's
solved.

   mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Les Mikesell  wrote:
> OK, but without knowing the cause, you already know the cure.   Make the
> virtual servers not share physical disks - they will always want a
> single head to be in different places at the same time.

Same old problem: budget :D
Also, I expect similar setups in the future so I need to be able to
know why and not simply throw hardware at it since the amount of disk
activity is relatively low. The curious part is that this doesn't
appear to happen during expected heavy usage. It almost never occurs
during working hours on a weekday ever-since I ionice the other
script.

> And there is also probably some ugly stuff about how using files for virtual 
> disk

Unfortunately yes, this was one part I misread/understood, should had
gone with raw partitions. However the real amount of i/o on these
aren't expected to be high, especially not during a lull hour like 1am
or on a Sunday.

> images and perhaps LVM on both the real and virtual side makes your disk
> blocks misaligned. Fixing that might help too.

No LVM on either side, kept unnecessary layers off the guest. And
manually fdisk'd the drive so ensure 4K alignment.

> What's the physical disk system?  I remember seeing something like that
> long ago where a raid controller had a large write cache that normally
> made it seem fast, but once in a while either filling it to a high-water
> mark or something else would trigger it to complete catch up before
> responding again - which could take several minutes with everything
> blocked.  And nothing else ever looked out of the ordinary.

Standard Intel-based board, onboard SATA controller with a pair of
SATA2 disks mirrored with mdadm. As I said, budget setup :D
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Brian Mathis
On Wed, Jun 29, 2011 at 5:22 PM,   wrote:
> Les Mikesell wrote:
>> On 6/29/2011 4:04 PM, Emmanuel Noobadmin wrote:
>>> On 6/30/11, Les Mikesell  wrote:
 The seriously on-the-cheap approach is to run a few virtual servers on
 hardware slightly better than one of the individual servers would need.
>>>
>>> Actually THAT is the fundamental problem ;)
>>> The physical server is frankly much more powerful than the two guest
>>> running on it. I have the same applications + public web/email running
>>> on old dual core machines with less memory than the guests.
> 
>> OK, but without knowing the cause, you already know the cure.   Make the
>> virtual servers not share physical disks - they will always want a
>> single head to be in different places at the same time.  And there is
>> also probably some ugly stuff about how using files for virtual disk
>> images and perhaps LVM on both the real and virtual side makes your disk
>> blocks misaligned. Fixing that might help too.
>
> Here's another one, that I got from another admin talking to VMware: watch
> out just how many virtual CPUs you assign to each VM. If you've assigned
> 4, it is actually going to sit there waiting until it gets 4 virtual CPUs.
> As of '09, VMware was recommending assigning 2.
>
>        mark


This is no longer true [1], but it's still a good idea to only assign
as many CPUs as you need.

[1] Source: VMware Engineer at VMware Forum 2011.

-☙ Brian Mathis ❧-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Les Mikesell
On 6/29/2011 4:22 PM, m.r...@5-cent.us wrote:
>
> Here's another one, that I got from another admin talking to VMware: watch
> out just how many virtual CPUs you assign to each VM. If you've assigned
> 4, it is actually going to sit there waiting until it gets 4 virtual CPUs.
> As of '09, VMware was recommending assigning 2.

That is, of course when you have overcommitted them...  If you have 
specied a certain number, you don't get a timeslice until that number 
are available at once - which sort of makes sense.

--
Les Mikesell
 lesmikes...@gmail.com

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, m.r...@5-cent.us  wrote:
> Here's another one, that I got from another admin talking to VMware: watch
> out just how many virtual CPUs you assign to each VM. If you've assigned
> 4, it is actually going to sit there waiting until it gets 4 virtual CPUs.
> As of '09, VMware was recommending assigning 2.

That was one of the first thing I was careful about when setting this
one up. The guests got 1 and 2 cores, leaving 1 spare core to the
host. I manually pinned the guest to specific cores as well to avoid
any potential issues from spinlocking. But not helping apparently.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Giovanni Tirloni  wrote:
> Linux includes I/O in how it calculates the load average so you're not
> measuring CPU alone.

On the host, it's expected, I've got two qemu-kvm process loading up
100% cpu. Within the guest VM, top looks like this, high load but low
cpu %.

top - 10:21:40 up 1 day, 59 min,  0 users,  load average: 16.72, 6.05, 2.29
Tasks: 176 total,   1 running, 175 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.3%us,  1.2%sy,  1.2%ni, 91.2%id,  2.7%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:   1017392k total,   970564k used,46828k free, 1436k buffers
Swap:  2040244k total,   200572k used,  1839672k free,30344k cached

> What does top show?
> Any error messages in /var/log during the time the server is unresponsive?
> Is network responsive? Latency normal too?

I think the network is responsive, pings work but nothing else does.
No error messages in both host and guest. faillog, messages and dmesg
give no clue. Which is why I figured I really need to be logged in,
check and if necessary kill innocent processes one by one until I find
the culprit when it's going crazy.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Paul Heinlein
On Thu, 30 Jun 2011, Emmanuel Noobadmin wrote:

> On 6/30/11, Paul Heinlein  wrote:
>> I actually relied for a while on the last choice. I had a remotely 
>> accessible root shell that never logged out. When things got 
>> sluggish, I was able to /bin/kill to my heart's content. It wasn't 
>> a pretty solution, but it kept me running until I was able to solve 
>> the problem properly.
>
> Would this work without the OOB hardware? E.g. if I leave a detached 
> screen'd SSH session open from another server, then ionice + nice 
> that shell on the problem server?

I wouldn't rely on that setup.

In my case, the problem server was connected to a Digi console server 
via its serial port. On another (non-problematic server), I opened a 
screen session and connected to the console on the problem server. I 
had to adjust some timeouts here and there to ensure an eternal 
console. :-)

-- 
Paul Heinlein <> heinl...@madboa.com <> http://www.madboa.com/
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Les Mikesell
On 6/29/2011 4:12 PM, Emmanuel Noobadmin wrote:
> On 6/30/11, Robert Heller  wrote:
>> If the problem is excessive load because Sendmail / Mimedefang / spamd /
>> etc. is too busy handling tons of mail/spam being dumped on your server, you
>> might want to look at these sendmail options:
>
> Mail was my first suspect because I had similar issues with exim/spamd
> locking up bad on another server. But usually that includes a high cpu
> % as well. Although this suspicion did help me pinpoint one of the
> causes, a script that periodically went through the email
> accounts/Maildirs and that was fixed from learning about ionice on the
> list.
>
> For a while I thought problem solved, but these couple of days, it's
> acting up again and nothing's jumping out screaming "I'm the problem!"
> and not being able to SSH to see what's exactly going on is making it
> difficult.

What's the physical disk system?  I remember seeing something like that 
long ago where a raid controller had a large write cache that normally 
made it seem fast, but once in a while either filling it to a high-water 
mark or something else would trigger it to complete catch up before 
responding again - which could take several minutes with everything 
blocked.  And nothing else ever looked out of the ordinary.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread m . roth
Les Mikesell wrote:
> On 6/29/2011 4:04 PM, Emmanuel Noobadmin wrote:
>> On 6/30/11, Les Mikesell  wrote:
>>> The seriously on-the-cheap approach is to run a few virtual servers on
>>> hardware slightly better than one of the individual servers would need.
>>
>> Actually THAT is the fundamental problem ;)
>> The physical server is frankly much more powerful than the two guest
>> running on it. I have the same applications + public web/email running
>> on old dual core machines with less memory than the guests.

> OK, but without knowing the cause, you already know the cure.   Make the
> virtual servers not share physical disks - they will always want a
> single head to be in different places at the same time.  And there is
> also probably some ugly stuff about how using files for virtual disk
> images and perhaps LVM on both the real and virtual side makes your disk
> blocks misaligned. Fixing that might help too.

Here's another one, that I got from another admin talking to VMware: watch
out just how many virtual CPUs you assign to each VM. If you've assigned
4, it is actually going to sit there waiting until it gets 4 virtual CPUs.
As of '09, VMware was recommending assigning 2.

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Rainer Duffner

Am 29.06.2011 um 23:17 schrieb Lamar Owen:

> On Wednesday, June 29, 2011 04:43:09 PM Rainer Duffner wrote:
>> Virtualization is an option, but the trouble is: if the server is I/ 
>> O-
>> constrained anyway, virtualization won't help.
>> Everything will just be even slower.
>
> That depends.  More expensive servers that would be suitable for  
> virtualization host use also tend to have better I/O subsystems and  
> faster disks.  Relative to a 'cheap' system with much poorer base I/ 
> O bandwidth.


The OP clearly stated that he's probably not running a datacenter full  
of DL580g7 servers...

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Lamar Owen
On Wednesday, June 29, 2011 04:43:09 PM Rainer Duffner wrote:
> Virtualization is an option, but the trouble is: if the server is I/O- 
> constrained anyway, virtualization won't help.
> Everything will just be even slower.

That depends.  More expensive servers that would be suitable for virtualization 
host use also tend to have better I/O subsystems and faster disks.  Relative to 
a 'cheap' system with much poorer base I/O bandwidth. 
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Les Mikesell
On 6/29/2011 4:04 PM, Emmanuel Noobadmin wrote:
> On 6/30/11, Les Mikesell  wrote:
>> The seriously on-the-cheap approach is to run a few virtual servers on
>> hardware slightly better than one of the individual servers would need.
>
> Actually THAT is the fundamental problem ;)
> The physical server is frankly much more powerful than the two guest
> running on it. I have the same applications + public web/email running
> on old dual core machines with less memory than the guests.
>
> Nothing that's being done is out of ordinary except something ordinary
> coupled with two virtual guest doing it at the same thing on the same
> physical disks causes everything to go haywire. But because "it" is
> otherwise normal, I haven't figured out a way to pinpoint what is it
> after the previous issue was solved.

OK, but without knowing the cause, you already know the cure.   Make the 
virtual servers not share physical disks - they will always want a 
single head to be in different places at the same time.  And there is 
also probably some ugly stuff about how using files for virtual disk 
images and perhaps LVM on both the real and virtual side makes your disk 
blocks misaligned. Fixing that might help too.

-- 
   Les Mikesell
 lesmikes...@gmail.com

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Robert Heller  wrote:
> If the problem is excessive load because Sendmail / Mimedefang / spamd /
> etc. is too busy handling tons of mail/spam being dumped on your server, you
> might want to look at these sendmail options:

Mail was my first suspect because I had similar issues with exim/spamd
locking up bad on another server. But usually that includes a high cpu
% as well. Although this suspicion did help me pinpoint one of the
causes, a script that periodically went through the email
accounts/Maildirs and that was fixed from learning about ionice on the
list.

For a while I thought problem solved, but these couple of days, it's
acting up again and nothing's jumping out screaming "I'm the problem!"
and not being able to SSH to see what's exactly going on is making it
difficult.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Giovanni Tirloni
On Wed, Jun 29, 2011 at 5:57 PM, Emmanuel Noobadmin
wrote:

> On 6/30/11, Giovanni Tirloni  wrote:
> > I would approach this issue from another perspective: who's locking up
> the
> > server (as in eating all resources) and how to stop/constrain it. You can
> > try to renice the sshd process and see what happens. I'm not entirely
> sure
> > what 'locked up' means in this context.
>
> Server's unresponsive to the external world. It isn't dead, on two
> occasions, when it happened at times like Sunday and 1am in the night,
> I could afford to wait it out and see that it eventually does recover
> from whatever it was.
>
> It's almost definitely related to disk i/o due to the VM guest
> fighting over the disks where their virtual disk-files are. However,
> the hard part is figuring out the exact factors, I know CPU isn't an
> issue having set up scripts to log top output when load goes above 5.


Linux includes I/O in how it calculates the load average so you're not
measuring CPU alone.

What does top show?
Any error messages in /var/log during the time the server is unresponsive?
Is network responsive? Latency normal too?

-- 
Giovanni Tirloni
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Paul Heinlein  wrote:
> I actually relied for a while on the last choice. I had a remotely
> accessible root shell that never logged out. When things got sluggish,
> I was able to /bin/kill to my heart's content. It wasn't a pretty
> solution, but it kept me running until I was able to solve the problem
> properly.

Would this work without the OOB hardware? E.g. if I leave a detached
screen'd SSH session open from another server, then ionice + nice that
shell on the problem server?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Les Mikesell  wrote:
> The seriously on-the-cheap approach is to run a few virtual servers on
> hardware slightly better than one of the individual servers would need.

Actually THAT is the fundamental problem ;)
The physical server is frankly much more powerful than the two guest
running on it. I have the same applications + public web/email running
on old dual core machines with less memory than the guests.

Nothing that's being done is out of ordinary except something ordinary
coupled with two virtual guest doing it at the same thing on the same
physical disks causes everything to go haywire. But because "it" is
otherwise normal, I haven't figured out a way to pinpoint what is it
after the previous issue was solved.

>   You are much less likely to kill the host (expecially something like
> ESXi) to the point where you can't connect,

Just my luck :D

>and more likely to be able
> to afford the out-of-band management where you need it since you have
> fewer boxes.

Unfortunately not the case. Most of these are basically applications +
email/web servers for small/medium customers so they are usually
scattered at the client's office or different datacenters.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Ljubomir Ljubojevic
Les Mikesell wrote:
>>> You can buy add-on PCI-cards for OOB-management, though.
>> Thanks for the information, although unless they are really cheap...
> 
> The seriously on-the-cheap approach is to run a few virtual servers on 
> hardware slightly better than one of the individual servers would need. 
>   You are much less likely to kill the host (expecially something like 
> ESXi) to the point where you can't connect, and more likely to be able 
> to afford the out-of-band management where you need it since you have 
> fewer boxes.
> 
OP has stated that he manages servers for his clientS meaning that they 
are in separate customer sites probably even cities. Virtualization is 
an option only if they all belong to the same company or provide cloud 
computing.

Ljubomir
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Paul Heinlein
On Wed, 29 Jun 2011, Keith Keller wrote:

> In addition to the suggestions already made, one possibility is to 
> attach a serial console or IP KVM.  Logging in may still be awful, 
> but at least you won't have to go through sshd.  I've been able to 
> log in through a serial getty when sshd was not responding or taking 
> too long (this works maybe 50-75% of the time; the rest of the time 
> it's too late, and even getty is unresponsive).  You have the added 
> advantage of being able to log in directly as root if you have 
> PermitRootLogin no in your sshd_config.

Even with OOB console access, there's still the problem of /bin/login 
timing out on highly loaded servers. The login.c source in the 
util-linux package hardwires the login timeout to 60 seconds. If your 
server can't process the login request in under a minute (not unusual 
if the load average is high and/or the machine is using swap), you 
can't login via *any* console.

So if killing the machine doesn't appeal to you, you still need OOB 
console access plus

   * a patched version of /bin/login with a longer timeout, or
   * a process-watcher that aggressively kills known troublemakers, or
   * a remotely accessible console that never logs out.

I actually relied for a while on the last choice. I had a remotely 
accessible root shell that never logged out. When things got sluggish, 
I was able to /bin/kill to my heart's content. It wasn't a pretty 
solution, but it kept me running until I was able to solve the problem 
properly.

-- 
Paul Heinlein <> heinl...@madboa.com <> http://www.madboa.com/
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Giovanni Tirloni  wrote:
> I would approach this issue from another perspective: who's locking up the
> server (as in eating all resources) and how to stop/constrain it. You can
> try to renice the sshd process and see what happens. I'm not entirely sure
> what 'locked up' means in this context.

Server's unresponsive to the external world. It isn't dead, on two
occasions, when it happened at times like Sunday and 1am in the night,
I could afford to wait it out and see that it eventually does recover
from whatever it was.

It's almost definitely related to disk i/o due to the VM guest
fighting over the disks where their virtual disk-files are. However,
the hard part is figuring out the exact factors, I know CPU isn't an
issue having set up scripts to log top output when load goes above 5.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Les Mikesell
On 6/29/2011 3:43 PM, Rainer Duffner wrote:
>
> Virtualization is an option, but the trouble is: if the server is I/O-
> constrained anyway, virtualization won't help.
> Everything will just be even slower.

That's sort-of true, but you don't have to manage the host through the 
same interface the guests use - and you need to solve the problem 
causing the issue anyway, not continue to work around it with restarts.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Keith Keller
On Thu, Jun 30, 2011 at 03:50:30AM +0800, Emmanuel Noobadmin wrote:
> I was having problems with the same server locking up to the point I
> can't even get in via SSH. I've already used HTB/TC to reserve
> bandwidth for my SSH port but the problem now isn't an attack on the
> bandwidth. So I'm trying to figure out if there's a way to ensure that
> SSH is given cpu and i/o priority.

As you've probably figured out, the short answer is no.  There are
sometimes workarounds, of course.

> Since I'm not the only person who face problems trying to remotely
> access a locked up server, surely somebody must had come up with a
> solution that didn't involve somebody/something hitting the power
> button?

In addition to the suggestions already made, one possibility is to
attach a serial console or IP KVM.  Logging in may still be awful, but
at least you won't have to go through sshd.  I've been able to log in
through a serial getty when sshd was not responding or taking too long
(this works maybe 50-75% of the time; the rest of the time it's too
late, and even getty is unresponsive).  You have the added advantage of
being able to log in directly as root if you have PermitRootLogin no in
your sshd_config.

If your I/O problem is due to running out of memory and thrashing swap,
you can try to be more aggressive with the OOM killer settings.

As someone else mentioned, it might help if you elaborated on "locked
up".  What are the common scenarios you see?

--keith

-- 
kkel...@wombat.san-francisco.ca.us



pgptCGdFo66rz.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Rainer Duffner

Am 29.06.2011 um 22:26 schrieb Emmanuel Noobadmin:

> On 6/30/11, Rainer Duffner  wrote:
>> If it's a server that actually deserves that name, it should have  
>> IPMI
>> on board.
>
> Problem is some of us work for budget constraints customers and define
> server by purpose and not specifications. So very often they buy
> servers based on budget and that it's good enough to run most
> applications for X users. Unfortunately, very often I'm the one who
> ends up managing these simply because our applications run on them.
>


I'd go for a power-switch then.
Less logic.

http://computers.shop.ebay.com/Computers-Networking-/58058/i.html?_nkw=remote+power+switch&_catref=1&_fln=1&_trksid=p3286.c0.m282
http://www.amazon.com/NP-0801D-Switchable-manufactured-Temperature-Monitoring/dp/B002WLQ6ZI



>> You can buy add-on PCI-cards for OOB-management, though.
>
> Thanks for the information, although unless they are really cheap...



Define "cheap".
I live and work in 2011's 6th most expensive city of the world

Virtualization is an option, but the trouble is: if the server is I/O- 
constrained anyway, virtualization won't help.
Everything will just be even slower.




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Robert Heller
At Wed, 29 Jun 2011 16:08:02 -0400 (EDT) CentOS mailing list 
 wrote:

> 
> >
> > Am 29.06.2011 um 21:50 schrieb Emmanuel Noobadmin:
> >
> >>
> >> Since I'm not the only person who face problems trying to remotely
> >> access a locked up server, surely somebody must had come up with a
> >> solution that didn't involve somebody/something hitting the power
> >> button?
> >
> >
> >
> > Yes, it's called "out of band management".
> > Have dial-in access to IPMI/iLO interfaces or just an APC remote
> > controlled power-switch to power-off the server.
> 
> Perhaps this suggestion is applicable:
> setup a cron job where the sshd server is restarted (once or several times
> per day, or per week, etc).
> 
> At one time, I had a server on an ISP that, with time, became woefully
> underpowered (the anti-spam/anti-virus program ate CPU power and RAM) to
> the point that occasionally, and with more frequency (once a week?) sshd
> would become unresponsive. This required that someone be at console to
> restart sshd; or if the problem was not understandable, reboot the box.
> 
> Having sshd restarted in cron worked until we got a new, soopa-doopa box.

If the problem is excessive load because Sendmail / Mimedefang / spamd /
etc. is too busy handling tons of mail/spam being dumped on your server, you
might want to look at these sendmail options:

ConnectionRateThrottle  (34.8.12)
MaxDaemonChildren   (34.8.35)

also 

QueueLA (34.8.50)
RefuseLA(34.8.54)

setting these can keep Sendmail (and Mimedefang, spamd,  etc.) from
overwhelming the system.

> 
> >
> > Rainer
> 
> Max
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
>  

-- 
Robert Heller -- 978-544-6933 / hel...@deepsoft.com
Deepwoods Software-- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments



   
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Les Mikesell
On 6/29/2011 3:26 PM, Emmanuel Noobadmin wrote:
> On 6/30/11, Rainer Duffner  wrote:
>> If it's a server that actually deserves that name, it should have IPMI
>> on board.
>
> Problem is some of us work for budget constraints customers and define
> server by purpose and not specifications. So very often they buy
> servers based on budget and that it's good enough to run most
> applications for X users. Unfortunately, very often I'm the one who
> ends up managing these simply because our applications run on them.
>
>> You can buy add-on PCI-cards for OOB-management, though.
>
> Thanks for the information, although unless they are really cheap...

The seriously on-the-cheap approach is to run a few virtual servers on 
hardware slightly better than one of the individual servers would need. 
  You are much less likely to kill the host (expecially something like 
ESXi) to the point where you can't connect, and more likely to be able 
to afford the out-of-band management where you need it since you have 
fewer boxes.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Brian Mathis
On Wed, Jun 29, 2011 at 4:15 PM, Emmanuel Noobadmin
 wrote:
> On 6/30/11, Rainer Duffner  wrote:
>> Yes, it's called "out of band management".
>> Have dial-in access to IPMI/iLO interfaces or just an APC remote
>> controlled power-switch to power-off the server.
>
> I don't want to reboot the server everytime something like that
> happens. I'll expect pretty nasty problems will develop after a few
> dozen unclean shutdowns like that.
>
> Would ILO work on a server that's unresponsive due to heavy load? The
> actual network access isn't a problem so dial up isn't necessary. The
> other problem is the server in question probably doesn't have ILO
> features on the mainboard.


Doing a hard power-off is extreme, but could be the last resort option.

ILO is just one product (by HP) that provides out-of-band management
for servers.  Dell has DRAC, and there are others.  They allow you
access to the server's console as if you are standing there, as well
as other functions like power on/off, virtual CD drive, etc...  These
are usually built-in to the server so you can't really add-on later.

You can get similar functionality by using a remote IP-based KVM.
They only provide the remote console, not power on/off or virtual CD.
For a single server, a low cost option is the Lantronix Spider or
Spider Duo.  It provides a remote console for a single server for a
few hundred $$$s.

An alternative that is usable for Linux servers is a remote serial
console; it allows you to ssh into it and then connect to the serial
port of the server.  You will need to setup the bios, grub, and a
serial getty to be able to login to a server this way.  wti.com makes
a good one that I currently use.

All of these solutions are "out of band" meaning they do not directly
interface with the operating system, so if there's a problem with the
server, they are not affected by it.

Your name suggests you are new to sysadmin.  One of the lessons here
is to always have at least 1 method of out of band management as part
of the non-negotiable requirements for a server, especially a remote
one.

-☙ Brian Mathis ❧-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Rainer Duffner  wrote:
> If it's a server that actually deserves that name, it should have IPMI
> on board.

Problem is some of us work for budget constraints customers and define
server by purpose and not specifications. So very often they buy
servers based on budget and that it's good enough to run most
applications for X users. Unfortunately, very often I'm the one who
ends up managing these simply because our applications run on them.

> You can buy add-on PCI-cards for OOB-management, though.

Thanks for the information, although unless they are really cheap...
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Giovanni Tirloni
On Wed, Jun 29, 2011 at 4:50 PM, Emmanuel Noobadmin
wrote:

> I was having problems with the same server locking up to the point I
> can't even get in via SSH. I've already used HTB/TC to reserve
> bandwidth for my SSH port but the problem now isn't an attack on the
> bandwidth. So I'm trying to figure out if there's a way to ensure that
> SSH is given cpu and i/o priority.
>
> However, so far reading seems to imply that it's probably not going to
> help if the issue is i/o related and/or it would require escalating
> SSH to such levels (above paging/filesystem processes) that makes it a
> really bad idea.
>
> Since I'm not the only person who face problems trying to remotely
> access a locked up server, surely somebody must had come up with a
> solution that didn't involve somebody/something hitting the power
> button?
>
>
I would approach this issue from another perspective: who's locking up the
server (as in eating all resources) and how to stop/constrain it. You can
try to renice the sshd process and see what happens. I'm not entirely sure
what 'locked up' means in this context.

-- 
Giovanni Tirloni
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Rainer Duffner

Am 29.06.2011 um 22:15 schrieb Emmanuel Noobadmin:

> On 6/30/11, Rainer Duffner  wrote:
>
>> Yes, it's called "out of band management".
>> Have dial-in access to IPMI/iLO interfaces or just an APC remote
>> controlled power-switch to power-off the server.
>
> I don't want to reboot the server everytime something like that
> happens. I'll expect pretty nasty problems will develop after a few
> dozen unclean shutdowns like that.
>
> Would ILO work on a server that's unresponsive due to heavy load?


ILO used to be a separate board with a separate NIC and a separate CPU  
etc.
Nowadays, it's just an additional chip on the board.

It works until the power-supply is fried.


> The
> actual network access isn't a problem so dial up isn't necessary. The
> other problem is the server in question probably doesn't have ILO
> features on the mainboard.


If it's a server that actually deserves that name, it should have IPMI  
on board.
You can buy add-on PCI-cards for OOB-management, though.





___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Rainer Duffner

Am 29.06.2011 um 22:08 schrieb Max Pyziur:

>>
>> Am 29.06.2011 um 21:50 schrieb Emmanuel Noobadmin:
>>
>>>
>>> Since I'm not the only person who face problems trying to remotely
>>> access a locked up server, surely somebody must had come up with a
>>> solution that didn't involve somebody/something hitting the power
>>> button?
>>
>>
>>
>> Yes, it's called "out of band management".
>> Have dial-in access to IPMI/iLO interfaces or just an APC remote
>> controlled power-switch to power-off the server.
>
> Perhaps this suggestion is applicable:
> setup a cron job where the sshd server is restarted (once or several  
> times
> per day, or per week, etc).



If the problem is lack of I/O, only power-on/off will work.
Or shutting down the offending process(es).

OOB-management is a necessity nevertheless.
You don't have to be control-freak to love it ;-)



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
On 6/30/11, Rainer Duffner  wrote:

> Yes, it's called "out of band management".
> Have dial-in access to IPMI/iLO interfaces or just an APC remote
> controlled power-switch to power-off the server.

I don't want to reboot the server everytime something like that
happens. I'll expect pretty nasty problems will develop after a few
dozen unclean shutdowns like that.

Would ILO work on a server that's unresponsive due to heavy load? The
actual network access isn't a problem so dial up isn't necessary. The
other problem is the server in question probably doesn't have ILO
features on the mainboard.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Max Pyziur
>
> Am 29.06.2011 um 21:50 schrieb Emmanuel Noobadmin:
>
>>
>> Since I'm not the only person who face problems trying to remotely
>> access a locked up server, surely somebody must had come up with a
>> solution that didn't involve somebody/something hitting the power
>> button?
>
>
>
> Yes, it's called "out of band management".
> Have dial-in access to IPMI/iLO interfaces or just an APC remote
> controlled power-switch to power-off the server.

Perhaps this suggestion is applicable:
setup a cron job where the sshd server is restarted (once or several times
per day, or per week, etc).

At one time, I had a server on an ISP that, with time, became woefully
underpowered (the anti-spam/anti-virus program ate CPU power and RAM) to
the point that occasionally, and with more frequency (once a week?) sshd
would become unresponsive. This required that someone be at console to
restart sshd; or if the problem was not understandable, reboot the box.

Having sshd restarted in cron worked until we got a new, soopa-doopa box.

>
> Rainer

Max
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Rainer Duffner

Am 29.06.2011 um 21:50 schrieb Emmanuel Noobadmin:

>
> Since I'm not the only person who face problems trying to remotely
> access a locked up server, surely somebody must had come up with a
> solution that didn't involve somebody/something hitting the power
> button?



Yes, it's called "out of band management".
Have dial-in access to IPMI/iLO interfaces or just an APC remote  
controlled power-switch to power-off the server.



Rainer
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Anyway to ensure SSH availability?

2011-06-29 Thread Emmanuel Noobadmin
I was having problems with the same server locking up to the point I
can't even get in via SSH. I've already used HTB/TC to reserve
bandwidth for my SSH port but the problem now isn't an attack on the
bandwidth. So I'm trying to figure out if there's a way to ensure that
SSH is given cpu and i/o priority.

However, so far reading seems to imply that it's probably not going to
help if the issue is i/o related and/or it would require escalating
SSH to such levels (above paging/filesystem processes) that makes it a
really bad idea.

Since I'm not the only person who face problems trying to remotely
access a locked up server, surely somebody must had come up with a
solution that didn't involve somebody/something hitting the power
button?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos