Re: [CentOS] kerberized-nfs - any experts out there?

2017-03-23 Thread Matt Garman
On Wed, Mar 22, 2017 at 6:11 PM, John Jasen <jja...@realityfailure.org> wrote:
> On 03/22/2017 03:26 PM, Matt Garman wrote:
>> Is anyone on the list using kerberized-nfs on any kind of scale?
>
> Not for a good many years.
>
> Are you using v3 or v4 NFS?

v4.  I think you can only do kerberized NFS with v4.


> Also, you can probably stuff the rpc.gss* and idmapd services into
> verbose mode, which may give you a better ideas as to whats going on.

I do that.  The logs are verbose, but generally too cryptic for me to
make sense of.  Web searches on the errors yield results at best 50%
of the time, and the hits almost never have a solution.

> And yes, the kernel does some kerberos caching. I think 10 to 15 minutes.

To me it looks like it's more on the order of an hour.  For example, a
simple test I've done is to do a "fresh" login on a server.  The
server has just been rebooted, and with the reboot, all the
/tmp/krb5cc* files were deleted.

I login via ssh, which implicitly establishes my Kerberos tickets.  I
deliberately do a "kdestroy".  Then I have a simple shell loop like
this:

while [ 1 ] ; do date ; ls ; sleep 30s ; done

Which is just doing an ls on my home directory, which is a kerberized
NFS mount.  Despite having done a kdestroy, this works, presumably
from cached credentials.  And it continues to work for *about* an
hour, and then I start getting permission denied.  I emphasized
"about" because it's not precisely one hour, but seems to range from
maybe 55 to 65 minutes.

But, that's a super-simple, controlled test.  What happens when you
add screen multiplexers (tmux, gnu screen) into the mix.  What if you
login "fresh" via password versus having your gss (kerberos)
credentials forwarded?  What if you're logged in multiple times on the
same machine by via different methods?
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] kerberized-nfs - any experts out there?

2017-03-23 Thread Matt Garman
On Wed, Mar 22, 2017 at 3:19 PM,  <m.r...@5-cent.us> wrote:
> Matt Garman wrote:
>> (2) Permission denied issues.  I have user Kerberos tickets
>> configured for 70 days.  But there is clearly some kind of
>> undocumented kernel caching going on.  Looking at the Kerberos server
>> logs, it looks like it "could" be a performance issue, as I see 100s
>> of ticket requests within the same second when someone tries to launch
>> a lot of jobs.  Many of these will fail with "permission denied" but
>> if they immediately re-try, it works.  Related to this, I have been
>> unable to figure out what creates and deletes the
>> /tmp/krb5cc_uid_random files.
>
> Are they asking for *new* credentials each time? They should only be doing
> one kinit.

Well, that's what I don't understand.  In practice, I don't believe a
user should ever have to explicitly do kinit, as their
credentials/tickets are implicitly created (and forwarded) via ssh.
Despite that, I see the /tmp/krb5cc_uid files accumulating over time.
But I've tried testing this, and I haven't been able to determine
exactly what creates those files.  And I don't understand why new
krb5cc_uid files are created when there is an existing, valid file
already.  Clearly some programs ignore existing files, and some create
new ones.

> And there's nothing in the logs, correct? Have you tried attaching strace
> to one of those, and see if you can get a clue as to what's happening?

Actually, I get this in the log:

Mar 22 13:25:09 daemon.err lnxdev108 rpc.gssd[19329]: WARNING:
handle_gssd_upcall: failed to find uid in upcall string 'mech=krb5'

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] kerberized-nfs - any experts out there?

2017-03-22 Thread Matt Garman
Is anyone on the list using kerberized-nfs on any kind of scale?

I've been fighting with this for years.  In general, when we have
issues with this system, they are random and/or not repeatable.  I've
had very little luck with community support.  I hope I don't offend by
saying that!  Rather, my belief is that these problems are very
niche/esoteric, and so beyond the scope of typical community support.
But I'd be delighted to be proven wrong!

So this is more of a "meta" question: anyone out there have any
general recommendations for how to get support on what I presume are
niche problems specific to our environment?  How is paid upstream
support?

Just to give a little insight into our issues: we have an
in-house-developed compute job dispatching system.  Say a user has
100s of analysis jobs he wants to run, he submits them to a central
master process, which in turn dispatches them to a "farm" of >100
compute nodes.  All these nodes have two different krb5p NFS mounts,
to which the jobs will read and write.  So while the users can
technically log in directly to the compute nodes, in practice they
never do.  The logins are only "implicit" when the job dispatching
system does a behind-the-scenes ssh to kick off these processes.

Just to give some "flavor" to the kinds of issues we're facing, what
tends to crop up are one of three things:

(1) Random crashes.  These are full-on kernel trace dumps followed
by an automatic reboot.  This was really bad under CentOS 5.  A random
kernel upgrade magically fixed it.  It happens almost never under
CentOS 6.  But happens fairly frequently under CentOS 7.  (We're
completely off CentOS 5 now, BTW.)

(2) Permission denied issues.  I have user Kerberos tickets
configured for 70 days.  But there is clearly some kind of
undocumented kernel caching going on.  Looking at the Kerberos server
logs, it looks like it "could" be a performance issue, as I see 100s
of ticket requests within the same second when someone tries to launch
a lot of jobs.  Many of these will fail with "permission denied" but
if they immediately re-try, it works.  Related to this, I have been
unable to figure out what creates and deletes the
/tmp/krb5cc_uid_random files.

(3) Kerberized NFS shares getting "stuck" for one or more users.
We have another monitoring app (in-house developed) that, among other
things, makes periodic checks of these NFS mounts.  It does so by
forking and doing a simple "ls" command.  This is to ensure that these
mounts are alive and well.  Sometimes, the "ls" command gets stuck to
the point where it can't even be killed via "kill -9".  Only a reboot
fixes it.  But the mount is only stuck for the user running the
monitoring app.  Or sometimes the monitoring app is fine, but an
actual user's processes will get stuck in "D" state (in top, means
waiting on IO), but everyone else's jobs (and access to the kerberizes
nfs shares) are OK.

This is actually blocking us from upgrading to CentOS 7.  But my
colleagues and I are at a loss how to solve this.  So this post is
really more of a semi-desperate plea for any kind of advice.  What
other resources might we consider?  Paid support is not out of the
question (within reason).  Are there any "super specialist"
consultants out there who deal in Kerberized NFS?

Thanks!
Matt
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Spotty internet connection

2017-02-03 Thread Matt Garman
On Fri, Feb 3, 2017 at 12:08 PM, John R Pierce  wrote:
> for Comcast/Xfinity, I'm using a Arris SB6183 that I got at Costco.   this
> is a simple modem/bridge, so /my/ router behind it gets the public IP.

Note that some residential ISPs may not offer "naked" Internet, and/or
won't allow you to bring your own device (BYOD).  At least in my area,
there are only two options for residential Internet; cable-based via
Comcast, and DSL-based via AT  I used to routinely switch back and
forth between the two, to play them against each other for the best
rates.  However, I had to give up on AT because they stopped
offering a "naked" service.  That is, when I was using them, I had the
most basic DSL modem, that literally did nothing except provide me
with a public Internet IP and the service.  Last I talked to them, I
could only use their service with their fancy all-in-one devices, that
are both a DSL modem and gateway/router/wireless AP.  I already have
all that infrastructure in my house, and I trust my ability to manage
it more than I trust the blackbox firmware that AT provides.

Going from memory, that all-in-one DSL service did give me a public
IP, but the device itself implemented NATing, so it looked like I was
getting a private IP.  There *may* have been a way to remove most of
the functionality of the all-in-one device ("DMZ mode" or something
like that); it's been discussed pretty heavily on the DSLReports
Forums.  (But, either way, even ignoring the technical grievances with
their service, AT's prices are higher and speed tiers lower than
Comcast's.)

TL;DR: (1) some ISPs may not allow BYOD; (2) if it looks like your ISP
is giving you a private IP, dig a little deeper, it could simply
appear that way due to the way the ISP configures the assigned device.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Spotty internet connection

2017-02-02 Thread Matt Garman
On Thu, Feb 2, 2017 at 7:13 PM, TE Dukes  wrote:
> Lately I have been getting slow  and partial page loads, server not found,
> server timed out, etc.. Get knocked off ssh when accessing my home server
> from work, etc. Its not the work connection because I don't have problems
> accessing other sites, just here at home and my home server.
>
> Is there any kind of utility to check for failing hardware?

I have the exact same problems from time to time via Comcast.  Mine
comes and goes, and lately it hasn't been too bad.  But when it comes,
it's down for very small amounts of time, maybe 30-90 seconds, which
is just long enough to be annoying, and make the service unusable.

When it was really bad (intermittent dropouts as described above,
almost every night during prime time, usually for several hours at a
time) I wrote a program to do constant pings to several servers at
once.  If you're interested, I'll see if I can find that script.  But,
conceptually, it ran concurrent pings to several sites, and kept some
stats on drops longer than some threshold.  Some tips on a program
like this: use IP addresses, rather than hostnames, because ultimately
using a hostname implicitly does a DNS lookup, which likely requires
Internet service to work.  I also did several servers at once, so I
could prove it wasn't just the one site I was pinging.  Included in
the list of servers was also the nexthop device beyond my house
(presumably Comcast's own router).  Use traceroute to figure out
network paths.

After running this for a while---before I called them with the
evidence---the problem magically cleared up, and since then it's been
infrequent enough that I haven't felt the need to fire up the script
again.  When it comes to residential Internet, I am quite cynical
towards monopoly ISPs like Comcast... so maybe they saw the constant
pings and knew I was building a solid case and fixed the problem.  Or
maybe enough people in my area complained of similar problems and they
actually felt uncharacteristically caring for a second.

I haven't been there in a while, but in the past, I've gotten a lot of
utility out of the DSLReports Forums[1].  There are private forums
that will put you in direct contact with technical people at your ISP.
It can sometimes be a good way to side-step the general customer
service hotline and get in touch with an actual engineer rather than a
script reader.  Maybe not, but worst-case you're only out some time.
Also, you might post this same question to one of the public forums
over there, as there seems to be lots of knowledgeable/helpful people
hanging out there.  (Despite the name, it's not only about DSL, but
consumer ISPs in general.)

[1] http://www.dslreports.com/forums/all

Good luck, let us know if you come up with any decent resolution!
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-27 Thread Matt Garman
On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell  wrote:
> This site is locked down like no other I have ever seen. You cannot
> bring anything into the site - no computers, no media, no phone. You
> ...
> This is my client's client, and even if I could circumvent their
> policy I would not do that. They have a zero tolerance policy and if
> ...

OK, no internet for real. :) Sorry I kept pushing this.  I made an
unflattering assumption that maybe it just hadn't occurred to you how
to get files in or out.  Sometimes there are "soft" barriers to
bringing files in or out: they don't want it to be trivial, but want
it to be doable if necessary.  But then there are times when they
really mean it.  I thought maybe the former applied to you, but
clearly it's the latter.  Apologies.

> These are all good debugging techniques, and I have tried some of
> them, but I think the issue is load related. There are 50 external
> machines ftp-ing to the C7 server, 24/7, thousands of files a day. And
> on the C6 client the script that processes them is running
> continuously. It will sometimes run for 7 hours then hang, but it has
> run for as long as 3 days before hanging. I have never been able to
> reproduce the errors/hanging situation manually.

If it truly is load related, I'd think you'd see something askew in
the sar logs.  But if the load tends to spike, rather than be
continuous, the sar sampling rate may be too coarse to pick it up.

> And again, this is only at this site. We have the same software
> deployed at 10 different sites all doing the same thing, and it all
> works fine at all of those.

Flaky hardware can also cause weird intermittent issues.  I know you
mentioned before your hardware is fairly new/decent spec; but that
doesn't make it immune to manufacturing defects.  For example, imagine
one voltage regulator that's ever-so-slightly out of spec.  It
happens.  Bad memory is not uncommon and certainly causes all kinds of
mysterious issues (though in my experience that tends to result in
spontaneous reboots or hard lockups, but truly anything could happen).

Ideally, you could take the system offline and run hardware
diagnostics, but I suspect that's impossible given your restrictions
on taking things in/out of the datacenter.

On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell  wrote:
> Well I spoke too soon. The importer (the one that was initially
> hanging that I came here to fix) hung up after running 20 hours. There
> were no NFS errors or messages on neither the client nor the server.
> When I restarted it, it hung after 1 minute, Restarted it again and it
> hung after 20 seconds. After that when I restarted it it hung
> immediately. Still no NFS errors or messages. I tried running the
> process on the server and it worked fine. So I have to believe this is
> related to nobarrier. Tomorrow I will try removing that setting, but I
> am no closer to solving this and I have to leave Japan Saturday :-(
>
> The bad disk still has not been replaced - that is supposed to happen
> tomorrow, but I won't have enough time after that to draw any
> conclusions.

I've seen behavior like that with disks that are on their way out...
basically the system wants to read a block of data, and the disk
doesn't read it successfully, so it keeps trying.  The kind of disk,
what kind of controller it's behind, raid level, and various other
settings can all impact this phenomenon, and also how much detail you
can see about it.  You already know you have one bad disk, so that's
kind of an open wound that may or may not be contributing to your
bigger, unsolved problem.

So that makes me think, you can also do some basic disk benchmarking.
iozone and bonnie++ are nice, but I'm guessing they're not installed
and you don't have a means to install them.  But you can use "dd" to
do some basic benchmarking, and that's all but guaranteed to be
installed.  Similar to network benchmarking, you can do something
like:
time dd if=/dev/zero of=/tmp/testfile.dat bs=1G count=256

That will generate a 256 GB file.  Adjust "bs" and "count" to whatever
makes sense.  General rule of thumb is you want the target file to be
at least 2x the amount of RAM in the system to avoid cache effects
from skewing your results.  Bigger is even better if you have the
space, as it increases the odds of hitting the "bad" part of the disk
(if indeed that's the source of your problem).

Do that on C6, C7, and if you can a similar machine as a "control"
box, it would be ideal.  Again, we're looking for outliers, hang-ups,
timeouts, etc.

+1 to Gordon's suggestion to sanity check MTU sizes.

Another random possibility... By somewhat funny coincidence, we have
some servers in Japan as well, and were recently banging our heads
against the wall with some weird networking issues.  The remote hands
we had helping us (none of our staff was on site) claimed one or more
fiber cables were dusty, enough that it was affecting 

Re: [CentOS] NFS help

2016-10-26 Thread Matt Garman
On Tue, Oct 25, 2016 at 7:22 PM, Larry Martell  wrote:
> Again, no machine on the internal network that my 2 CentOS hosts are
> on are connected to the internet. I have no way to download anything.,
> There is an onerous and protracted process to get files into the
> internal network and I will see if I can get netperf in.

Right, but do you have physical access to those machines?  Do you have
physical access to the machine which on which you use PuTTY to connect
to those machines?  If yes to either question, then you can use
another system (that does have Internet access) to download the files
you want, put them on a USB drive (or burn to a CD, etc), and bring
the USB/CD to the C6/C7/PuTTY machines.

There's almost always a technical way to get files on to (or out of) a
system.  :)  Now, your company might have *policies* that forbid
skirting around the technical measures that are in place.

Here's another way you might be able to test network connectivity
between C6 and C7 without installing new tools: see if both machines
have "nc" (netcat) installed.  I've seen this tool referred to as "the
swiss army knife of network testing tools", and that is indeed an apt
description.  So if you have that installed, you can hit up the web
for various examples of its use.  It's designed to be easily scripted,
so you can write your own tests, and in theory implement something
similar to netperf.

OK, I just thought of another "poor man's" way to at least do some
sanity testing between C6 and C7: scp.  First generate a huge file.
General rule of thumb is at least 2x the amount of RAM in the C7 host.
You could create a tarball of /usr, for example (e.g. "tar czvf
/tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough
to hold this).  Then, first do this: "time scp /tmp/bigfile.tar.gz
localhost:/tmp/bigfile_copy.tar.gz".  This will literally make a copy
of that big file, but will route through most of of the network stack.
Make a note of how long it took.  And also be sure your /tmp partition
is big enough for two copies of that big file.

Now, repeat that, but instead of copying to localhost, copy to the C6
box.  Something like: "time scp /tmp/bigfile.tar.gz :/tmp/".  Does the time reported differ greatly from when you
copied to localhost?  I would expect them to be reasonably close.
(And this is another reason why you want a fairly large file, so the
transfer time is dominated by actual file transfer, rather than the
overhead.)

Lastly, do the reverse test: log in to the C6 box, and copy the file
back to C7, e.g. "time scp /tmp/bigfile.tar.gz :/tmp/bigfile_copy2.tar.gz".  Again, the time should be
approximately the same for all three transfers.  If either or both of
the latter two copies take dramatically longer than the first, then
there's a good chance something is askew with the network config
between C6 and C7.

Oh... all this time I've been jumping to fancy tests.  Have you tried
the simplest form of testing, that is, doing by hand what your scripts
do automatically?  In other words, simply try copying files between C6
and C7 using the existing NFS config?  Can you manually trigger the
errors/timeouts you initially posted?  Is it when copying lots of
small files?  Or when you copy a single huge file?  Any kind of file
copying "profile" you can determine that consistently triggers the
error?  That could be another clue.

Good luck!
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-25 Thread Matt Garman
On Mon, Oct 24, 2016 at 6:09 PM, Larry Martell  wrote:
> The machines are on a local network. I access them with putty from a
> windows machine, but I have to be at the site to do that.

So that means when you are offsite there is no way to access either
machine?  Does anyone have a means to access these machines from
offsite?

> Yes, the C6 instance is running on the C7 machine. What could be
> mis-configured? What would I check to find out?

OK, so these two machines are actually the same physical hardware, correct?

Do you know, is the networking between the two machines "soft", as in
done locally on the machine (typically through NAT or briding)?  Or is
it "hard", in that you have a dedicated NIC for the host and a
separate dedicated NIC for the guest, and actual cables going out of
each interface and connected to a switch/hub/router?  I would expect
the former...

If it truly is a "soft" network between the machines, then that is
more evidence of a configuration error.  Now, unfortunately, with what
to look for: I have virtually no experience setting up C6 guests on a
C7 host; at least not enough to help you troubleshoot the issue.  But
in general, you should be able to hit up a web search and look for
howtos and other documents on setting up networking between a C7 host
and its guests.  That will allow you to (1) understand how it's
currently setup, (2) verify if there is any misconfig, and (3) correct
or change if needed.

> Yes, that is potential solution I had not thought of. The issue with
> this is that we have the same system installed at many, many sites,
> and they all work fine. It is only this site that is having an issue.
> We really do not want to have different SW running at just this one
> site. Running the script on the C7 host is a change, but at least it
> will be the same software as every place else.

IIRC, you said this is the only C7 instance?  That would mean it is
already not the same as every other site.  It may be conceptually the
same, but "under the hood", there are a tremendous number of changes
between C6 and C7.  Effectively every single package is different,
from the kernel all the way to trivial userspace tools.

> netperf is not installed.

Again, if you can use putty (which is ssh) to access these systems,
you implicitly have the ability to upload files (i.e. packages) to the
systems.  A simple tool like netperf should have few (if any)
dependencies, so you don't have to mess with mirroring the whole
centos repo.  Just grab the netperf rpm file from wherever, then use
scp (I believe it's called pscp when part of the Putty package) to
copy to your servers, yum install and start testing.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-24 Thread Matt Garman
Another alternative idea: you probably won't be comfortable with this,
but check out systemd-nspawn.  There are lots of examples online, and
even I wrote about how I use it:
http://raw-sewage.net/articles/fedora-under-centos/

This is unfortunately another "sysadmin" solution to your problem.
nspawn is the successor to chroot, if you are at all familiar with
that.  It's kinda-sorta like running a system-within-a-system, but
much more lightweight.  The "slave" systems share the running kernel
with the "master" system.  (I could say the "guest" and "host"
systems, but those are virtual machine terms, and this is not a
virtual machine.)  For your particular case, the main benefit is that
you can natively share filesystems, rather than use NFS to share
files.

So, it's clear you have network capability between the C6 and C7
systems.  And surely you must have ssh installed on both systems.
Therefore, you can transfer files between C6 and C7.  So here's a way
you can use systemd-nspawn to get around trying to install all the
extra libs you need on C7:

1. On the C7 machine, create a systemd-nspawn container.  This
container will "run" C6.
2. You can source everything you need from the running C6 system
directly.  Heck, if you have enough disk space on the C7 system, you
could just replicate the whole C6 tree to a sub-directory on C7.
3. When you configure the C6 nspawn container, make sure you pass
through the directory structure with these FTP'ed files.  Basically
you are substituting systemd-nspawn's bind/filesystem pass-through
mechanism in place of NFS.

With that setup, you can "probably" run all the C6 native stuff under
C7.  This isn't guaranteed to work, e.g. if your C6 programs require
hooks into the kernel, it could fail, because now you're running on a
different kernel... but if you only use userspace libraries, you'll
probably be OK.  But I was actually able to get HandBrake, compiled
for bleeding-edge Ubuntu, to work within a C7 nspawn container.

That probably trades one bit of complexity (NFS) for another
(systemd-nspawn).  But just throwing it out there if you're completely
stuck.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-24 Thread Matt Garman
On Mon, Oct 24, 2016 at 2:42 PM, Larry Martell  wrote:
>> At any rate, what I was looking at was seeing if there was any way to
>> simplify this process, and cut NFS out of the picture.  If you need
>> only to push these files around, what about rsync?
>
> It's not just moving files around. The files are read, and their
> contents are loaded into a MySQL database.

On what server does the MySQL database live?


> This site is not in any way connected to the internet, and you cannot
> bring in any computers, phones, or media of any kind. There is a
> process to get machines or files in, but it is onerous and time
> consuming. This system was set up and configured off site and then
> brought on site.

But clearly you have a means to log in to both the C6 and C7 servers,
right?  Otherwise, how would be able to see these errors, check
top/sar/free/iostat/etc?

And if you are logging in to both of these boxes, I assume you are
doing so via ssh?

Or are you actually physically sitting in front of these machines?

If you have ssh access to these machines, then you can trivially copy
files to/from them.  If ssh is installed and working, then scp should
also be installed and working.  Even if you don't have scp, you can
use tar over ssh to the same effect.  It's ugly, but doable, and there
are examples online for how to do it.

Also: you made a couple comments about these machines, it looks like
the C7 box (FTP server + NFS server) is running bare metal (i.e. not a
virtual machine).  The C6 instance (NFS client) is virtualized.  What
host is the C6 instance?

Is the C6 instance running under the C7 instance?  I.e., are both
machines on the same physical hardware?  If that is true, then your
"network" (at least the one between C7 and C6) is basically virtual,
and to have issues like this on the same physical box is certainly
indicative of a mis-configuration.


> To run the script on the C7 NFS server instead of the C6 NFS client
> many python libs will have to installed. I do have someone off site
> working on setting up a local yum repo with what I need, and then we
> are going to see if we can zip and email the repo and get it on site.
> But none of us are sys admins and we don't really know what we're
> doing so we may not succeed and it may take longer then I will be here
> in Japan (I am scheduled to leave Saturday).

Right, but my point is you can write your own custom script(s) to copy
files from C7 to C6 (based on rsync or ssh), do the processing on C6
(DB loading, whatever other processing), then move back to C7 if
necessary.  You said yourself you are a programmer not a sysadmin, so
change the nature of the problem from a sysadmin problem to a
programming problem.

I'm certain I'm missing something, but the fundamental architecture
doesn't make sense to me given what I understand of the process flow.

Were you able to run some basic network testing tools between the C6
and C7 machines?  I'm interested specifically in netperf, which does
round trip packet testing, both TCP and UDP.  I would look for packet
drops with UDP, and/or major performance outliers with TCP, and/or any
kind of timeouts with either protocol.

How is name resolution working on both machines?  Do you address
machines by hostname (e.g., "my_c6_server"), or explicitly by IP
address?  Are you using DNS or are the IPs hard-coded in /etc/hosts?

To me it still "smells" like a networking issue...

-Matt
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-24 Thread Matt Garman
On Sun, Oct 23, 2016 at 8:02 AM, Larry Martell  wrote:
>> To be clear: the python script is moving files on the same NFS file
>> system?  E.g., something like
>>
>> mv /mnt/nfs-server/dir1/file /mnt/nfs-server/dir2/file
>>
>> where /mnt/nfs-server is the mount point of the NFS server on the
>> client machine?
>
> Correct.
>
>> Or are you moving files from the CentOS 7 NFS server to the CentOS 6 NFS 
>> client?
>
> No the files are FTP-ed to the CentOS 7 NFS server and then processed
> and moved on the CentOS 6 NFS client.


I apologize if I'm being dense here, but I'm more confused on this
data flow now.  Your use of "correct" and "no" seems to be
inconsistent with your explanation.  Sorry!

At any rate, what I was looking at was seeing if there was any way to
simplify this process, and cut NFS out of the picture.  If you need
only to push these files around, what about rsync?

> The problem doing that is the files are processed and loaded to MySQL
> and then moved by a script that uses the Django ORM, and neither
> django, nor any of the other python packages needed are installed on
> the server. And since the server does not have an external internet
> connection (as I mentioned in my reply to Mark) getting it set up
> would require a large amount of effort.

...right, but I'm pretty sure rsync should be installed on the server;
I believe it's default in all except the "minimal" setup profiles.
Either way, it's trivial to install, as I don't think it has any
dependencies.  You can download the rsync rpm from mirror.centos.org,
then scp it to the server, then install via yum.  And Python is
definitely installed (requirement for yum) and Perl is probably
installed as well, so with rsync plus some basic Perl/Python scripting
you can create your own mover script.

Actually, rsync may not even be necessary, scp may be sufficient for
your purposes.  And scp should definitely be installed.


> Also, we have this exact same setup on over 10 other systems, and it
> is only this one that is having a problem. The one difference with
> this one is that the sever is CentOS7 - on all the other systems both
> the NFS server and client are CentOS6.

>From what you've described so far, with what appears to be a
relatively simple config, C6 or C7 "shouldn't" matter.  However, under
the hood, C6 and C7 are quite different.

> The python script checks the modification time of the file, and only
> if it has not been modified in more then 2 minutes does it process it.
> Otherwise it skips it and waits for the next run to potentially
> process it. Also, the script can tell if the file is incomplete in a
> few different ways. So if it has not been modified in more then 2
> minutes, the script starts to process it, but if it finds that it's
> incomplete it aborts the processing and leaves it for next time.

This script runs on C7 or C6?

> The hardware is new, and is in a rack in a server room with adequate
> and monitored cooling and power. But I just found out from someone on
> site that there is a disk failure, which happened back on Sept 3. The
> system uses RAID, but I don't know what level. I was told it can
> tolerate 3 disk failures and still keep working, but personally, I
> think all bets are off until the disk has been replaced. That should
> happen in the next day or 2, so we shall see.

OK, depending on the RAID scheme and how it's implemented, there could
be disk timeouts causing things to hang.


> I've been watching and monitoring the machines for 2 days and neither
> one has had a large CPU load, not has been using much memory.

How about iostat?  Also, good old "dmesg" can suggest if the system
with the failed drive is causing timeouts to occur.


> None on the client. On the server it has 1 dropped Rx packet.
>
>> Do
>>> "ethtool " on both machines to make sure both are linked up
>>> at the correct speed and duplex.
>
> That reports only "Link detected: yes" for both client and server.

OK, but ethtool should also say something like:

...
Speed: 1000Mb/s
Duplex: Full
...

For a 1gbps network.  If Duplex is reported as "half", then that is
definitely a problem.  Using netperf is further confirmation of
whether or not your network is functioning as expected.


> sar seems to be running, but I can only get it to report on the
> current day. The man page shows start and end time options, but is
> there a way to specify the stand and end date?

If you want to report on a day in the past, you have to pass the file
argument, something like this:

sar -A -f /var/log/sa/sa23 -s 07:00:00 -e 08:00:00

That would show you yesterday's data between 7am and 8am.  The files
in /var/log/sa/saXX are the files that correspond to the day.  By
default, XX will be the day of the month.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-21 Thread Matt Garman
On Fri, Oct 21, 2016 at 4:14 AM, Larry Martell  wrote:
> We have 1 system ruining Centos7 that is the NFS server. There are 50
> external machines that FTP files to this server fairly continuously.
>
> We have another system running Centos6 that mounts the partition the files
> are FTP-ed to using NFS.
>
> There is a python script running on the NFS client machine that is reading
> these files and moving them to a new dir on the same file system (a mv not
> a cp).

To be clear: the python script is moving files on the same NFS file
system?  E.g., something like

mv /mnt/nfs-server/dir1/file /mnt/nfs-server/dir2/file

where /mnt/nfs-server is the mount point of the NFS server on the
client machine?


Or are you moving files from the CentOS 7 NFS server to the CentOS 6 NFS client?

If the former, i.e., you are moving files to and from the same system,
is it possible to completely eliminate the C6 client system, and just
set up a local script on the C7 server that does the file moves?  That
would cut out a lot of complexity, and also improve performance
dramatically.

Also, what is the size range of these files?  Are they fairly small
(e.g. 10s of MB or less), medium-ish (100s of MB) or large (>1GB)?


> Almost daily this script hangs while reading a file - sometimes it never
> comes back and cannot be killed, even with -9. Other times it hangs for 1/2
> hour then proceeds on.

Timeouts relating to NFS are the worst.


> Coinciding with the hanging I see this message on the NFS server host:
>
> nfsd: peername failed (error 107)
>
> And on the NFS client host I see this:
>
> nfs: V4 server returned a bad sequence-id
> nfs state manager - check lease failed on NFSv4 server with error 5

I've been wrangling with NFS for years, but unfortunately those
particular messages don't ring a bell.

The first thing that came to my mind is: how does the Python script
running on the C6 client know that the FTP upload to the C7 server is
complete?  In other words, if someone is uploading "fileA", and the
Python script starts to move "fileA" before the upload is complete,
then at best you're setting yourself up for all kinds of confusion,
and at worst file truncation and/or corruption.

Making a pure guess about those particular errors: is there any chance
there is a network issue between the C7 server and the C6 client?
What is the connection between those two servers?  Are they physically
adjacent to each other and on the same subnet?  Or are they on
opposite ends of the globe connected through the Internet?

Clearly two machines on the same subnet, separated only by one switch
is the simplest case (i.e. the kind of simple LAN one might have in
his home).  But once you start crossing subnets, then routing configs
come into play.  And maybe you're using hostnames rather than IP
addresses directly, so then name resolution comes into play (DNS or
/etc/hosts).  And each switch hop you add requires that not only your
server network config needs to be correct, but also your switch config
needs to be correct as well.  And if you're going over the Internet,
well... I'd probably try really hard to not use NFS in that case!  :)

Do you know if your NFS mount is using TCP or UDP?  On the client you
can do something like this:

grep nfs /proc/mounts | less -S

And then look at what the "proto=XXX" says.  I expect it will be
either "tcp" or "udp".  If it's UDP, modify your /etc/fstab so that
the options for that mountpoint include "proto=tcp".  I *think* the
default is now TCP, so this may be a non-starter.  But the point is,
based purely on the conjecture that you might have an unreliable
network, TCP would be a better fit.

I hate to simply say "RTFM", but NFS is complex, and I still go back
and re-read the NFS man page ("man nfs").  This document is long and
very dense, but it's worth at least being familiar with its content.


> The first client message is always at the same time as the hanging starts.
> The second client message comes 20 minutes later.
> The server message comes 4 minutes after that.
> Then 3 minutes later the script un-hangs (if it's going to).

In my experience, delays that happen on consistent time intervals that
are on the order of minutes tend to smell of some kind of timeout
scenario.  So the question is, what triggers the timeout state?

> Can anyone shed any light on to what could be happening here and/or what I
> could do to alleviate these issues and stop the script from hanging?
> Perhaps some NFS config settings? We do not have any, so we are using the
> defaults.

My general rule of thumb is "defaults are generally good enough; make
changes only if you understand their implications and you know you
need them (or temporarily as a diagnostic tool)".

But anyway, my hunch is that there might be a network issue.  So I'd
actually start with basic network troubleshooting.  Do an "ifconfig"
on both machines: do you see any drops or interface errors?  Do
"ethtool " on both 

[CentOS] Kerberized NFS client and slow user write performance

2016-10-07 Thread Matt Garman
We seem to be increasingly hit by this bug:

https://access.redhat.com/solutions/2040223
"On RHEL 6 NFS client usring kerberos (krb5), one user experiences
slow write performance, another does not"

You need a RH subscription to see that in its entirety.  But the
subject basically says it all: randomly, one or more users will be
subjected to *terrible* NFS write performance that persists until
reboot.

There is a root cause shown, but that is cryptic to non-kernel devs;
it doesn't explain from a user perspective what triggers this state.
(That's why it appears to be random to me.)

There is no solution or workaround given.  This appears to be on a
per-user + per-server basis, so a crude workaround is to migrate the
user to a different server.  And we do regular reboots, which somewhat
hides the problem.

My question to the list: has anyone else dealt with this?  The link
says "Solution in Progress", but that was last updated nearly a year
ago.  We don't have any support contracts with upstream, just the
website access subscription, so I doubt RH will offer any help.
Appreciate any suggestions!

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Suggestions for Config Management Tool

2016-05-12 Thread Matt Garman
As others have said, in the end, it's a matter of personal preference
(e.g. vim or emacs).  You could spend a week reading articles and
forum discussions comparing all the different tools; but until you've
really used them, it will mostly be an academic exercise.  Of course,
the particulars of your environment might naturally lend itself to one
tool or the other, so it's certainly worth spending some time getting
an overview of the "idiom" of each tool.

That said, we are working on moving away from dozens of little
homegrown management scripts to Ansible.  It just feels "right" to me,
like how I would have designed such a system.  I like that it's built
on top of ssh.  Any sysadmin should be fairly intimate with ssh, so
why not build your CMS on top of a familiar tool?  (But, of course,
Ansible is flexible enough that you don't have to use ssh.)  I might
even go so far as to call it a "platform" rather than a tool.  Out of
the box, you can quickly get going having it do useful work by reading
the docs/tutorials on the website.  And just going through those
exercises, you'll start to see that there's a ton of flexibility
available, which is your option to exercise or not.

And that perhaps is one of the drawbacks.  We're actually somewhat in
"analysis paralysis" mode with Ansible right now.  Because there is so
much flexibility, we are constantly second-guessing ourselves the best
way to implement our fairly complex and diverse environments.  In
particular, how to group configuration "profiles".  E.g., this server
needs to be a DNS master, this server needs to be a DNS slave, this
server needs MySQL + DNS slave, this server needs these packages
installed, this server needs those packages but not these, etc etc.
But I always prefer a tool with too much flexibility over something
that forces you in to a specific way of doing things: that makes it
our problem, not the tool's.

The only other one I have any experience with is CFEngine.  I
tried---and I mean really tried---to get something going with
CFEngine3.  I just couldn't get my head around it.  The wacky DSL it
uses for expressing configs just wasn't intuitive to me; the whole
bootstrapping processes seemed to be overly-complex; I found the
documentation managed to be lengthy yet still lack real substance.  By
contrast: everything I've wanted to do in Ansible I was able to do
quickly (and usually in several ways); on the client side, the only
thing needed for an Ansible bootstrap is ssh; and the docs for Ansible
have met or exceeded all expectations.

My colleague and I were even able to quickly hack on some of the
Ansible Python code to add some functionality we wanted.  At least the
pieces we looked at appeared to be quite straightforward.  I have 15
years of C/C++ programming experience and wouldn't even consider
messing with the CFEngine code.  Maybe it's fine, but the complexity
of the rest of the system is enough to scare me away from looking at
the source.

To be fair, it was *many* years ago that I looked at CFE3; maybe many
of my issues have since been addressed.  But, at this point, Ansible
checks all my boxes, so that's where we're staying.

Again, that's just my taste/experience.  If you have the time, I'd
spin up some VMs and play with the different tools.  Try to implement
some of your key items, see how hard/easy they are.




On Thu, May 12, 2016 at 8:27 AM, Fabian Arrotin  wrote:
> On 12/05/16 10:21, James Hogarth wrote:
>> On 12 May 2016 at 08:22, Götz Reinicke - IT Koordinator <
>> goetz.reini...@filmakademie.de> wrote:
>>
>>> Hi,
>>>
>>> we see a growing need for a better Configuration management for our
>>> servers.
>>>
>>> Are there any known good resources for a comparison of e.g. Puppet,
>>> Chef, Ansible etc?
>>>
>>> What would you suggest and why? :)
>>>
>>>
>>>
>>
>> Puppet is great for central control with automatic runs making systems
>> right and keeping them in line, it's not an orchestration tool though -
>> however it's commonly supplemented with something like rundeck and/or
>> mcollective to assist here.
>>
>> Chef is great for a ruby house - you'll need to brush up on your ruby as
>> writing cookbooks is heavily tied to the language. Historically it was very
>> debian focused with issues like selinux problems. I believe these have been
>> generally resolved though.
>>
>> Ansible is a great orchestration tool and excellent for going from base to
>> a configured system. It is less of a tool to keep things inline with a base
>> however with no central automated runs (ignoring Tower which is not FOSS
>> yet).
>>
>> Ansible is also much simpler to get into given the tasks are just like
>> following through a script for defining how to make a system, as opposed to
>> learning an actual DSL like required for understanding puppet modules.
>>
>> There's a growing pattern of using ansible for orchestration alongside
>> puppet for definitions as well (there's a specific ansible module to carry
>> out a puppet 

[CentOS] tune2fs: Filesystem has unsupported feature(s) while trying to open

2016-04-19 Thread Matt Garman
I have an ext4 filesystem for which I'm trying to use "tune2fs -l".
Here is the listing of the filesystem from the "mount" command:

# mount | grep share
/dev/mapper/VolGroup_Share-LogVol_Share on /share type ext4
(rw,noatime,nodiratime,usrjquota=aquota.user,jqfmt=vfsv0,data=writeback,nobh,barrier=0)


When I try to run "tune2fs" on it, I get the following error:

# tune2fs -l /dev/mapper/VolGroup_Share-LogVol_Share
tune2fs 1.41.12 (17-May-2010)
tune2fs: Filesystem has unsupported feature(s) while trying to open
/dev/mapper/VolGroup_Share-LogVol_Share
Couldn't find valid filesystem superblock.


This filesystem was created on this system (i.e. not imported from
another system).  I have other ext4 filesystems on this server, and
they all work with "tune2fs -l".

Basic system info:

# rpm -qf `which tune2fs`
e2fsprogs-1.41.12-18.el6.x86_64

# cat /etc/redhat-release
CentOS release 6.5 (Final)

# uname -a
Linux lnxutil8 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


I did a little web searching on this, most of the hits were for much
older systems, where (for example) the e2fsprogs only supported up to
ext3, but the user had an ext4 filesystem.  Obviously that's not the
case here.  In other words, the filesystem was created with the
mkfs.ext4 binary from the same e2fsprogs package as the tune2fs binary
I'm trying to use.

Anyone ever seen anything like this?

Thanks!
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Just need to vent

2016-01-24 Thread Matt Garman
I haven't used gnome3, or any Linux desktop in earnest for a long time...
But I used to be semi-obsessed with tweaking and configuring various Linux
desktops. And back when I was doing that, there were dozens of desktop
programs available, from super lightweight bare bones window managers, to
full blown desktop environments that do everything under the sun (and of
course, everything in between).

So my question is: while gnome3 might not float your boat, why not try one
of the countless other desktops? It's all open source...

FWIW, I was never a fan of full blown desktop environments like kde/gnome
simply because I had a preference for lightweight, standalone window
managers. My favorites were fluxbox and icewm.

Besides those, off the top of my head, I know of: blackbox, openbox, Joe's
wm, window maker, and enlightenment 16 in the simple/lightweight window
manager category. Xfce has already been mentioned, and there's also lde and
the latest enlightenment in the full-on desktop environment category.

A little elbow grease may be required, but, I'm certain there's *a* Linux
gui out there for everyone.
On Jan 24, 2016 12:20, "Joacim Melin"  wrote:

>
> > On 24 Jan 2016, at 17:45, Peter Duffy  wrote:
> >
> > On Sat, 2016-01-23 at 20:27 -0600, Frank Cox wrote:
> >> On Sat, 23 Jan 2016 20:05:02 -0500
> >> Mark LaPierre wrote:
> >>
> >>> The main reason I'm still using, nearly obsolete, CentOS 6 is because I
> >>> don't want to have to deal with Gnome 3.
> >>
> >> Install Mate on Centos 7 and you never have to touch Gnome 3.  I did,
> >> and my desktops don't look or work any different today than they did
> >> under Centos 6.
> >>
> >
> > Trouble is that when you go from 6 to 7, you also have the delights of
> > systemd and grub 2 to contend with.
> >
> > I'm also still using CentOS 6, and currently have no desire to
> > "upgrade". I'm still in shock after trying to upgrade to Red Hat 7 at
> > work, and after the upgrade (apart from being faced with the gnome3
> > craziness) finding that many of the admin commands either didn't work,
> > or only worked partially via a wrapper. (And the added insult that when
> > I shut down the box, it gave a message something like: "shutdown status
> > asserted" and then hung, so that it had to be power-cycled. Then when it
> > came back up, it went through all the fs checks as though it had shut
> > down ungracefully.) I allowed some of the senior developers to try the
> > box themselves for a while, and based on their findings, it was decided
> > to switch to Ubuntu (which (at least then) didn't use systemd,) together
> > with Mate and XFCE.
> >
> > Similarly with others who have commented, I simply cannot understand why
> > the maintainers of crucial components in linux have this thing about
> > making vast changes which impact (usually adversely) on users and
> > admins, without (apparently) any general discussion or review of the
> > proposed changes. What happened to RFCs? Maybe it's a power thing - we
> > can do it, so we're gonna do it, and if ya don't like it, tough!
> >
> > It would be very interesting to know how many other users are still on
> > CentOS/Red Hat 6 as a result of reluctance to enjoy all the - erm -
> > improvements in 7. Maybe it's time to fork CentOS 6 and make it look and
> > behave like 7 without systemd (or even better, with some way of
> > selecting the init methodology at install-time and afterwards), and with
> > gnome2 (or a clear choice between 2 and 3). Call it DeCentOS.
> >
> >
>
> I'm still on 6.7 and have no plans to upgrade my 20+ servers running it.
> KVM runs fine, all my services runs fine.
> Everything is stable, fast enough and I can find my way around a CentOS
> 6.x system like the palm of my hand.
>
> I tried installing CentOS 7 when it was released without knowing about all
> the changes. I spent about an hour trying to understand what had happened
> and where things where located. And with "trying" I mean searching,
> googling and just feeling really frustrated.
>
> I then realised that it was simply not for me - lots of (IMHO unnecessary)
> changes had been made and I guess when the time comes to really upgrade my
> servers I will go with Ubuntu, FreeBSD or whatever seems to be the the best
> option.
>
> I'm sure there are technical reasons to upgrade to CentOS 7, I'm yet to be
> bothered to find out though since it's damn near impossible to actually get
> work done with it installed.
>
> A fork of CentOS 6 would be very, very, very interesting to run from my
> point of view.
>
> Joacim
>
>
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HDD badblocks

2016-01-18 Thread Matt Garman
That's strange, I expected the SMART test to show some issues.
Personally, I'm still not confident in that drive.  Can you check
cabling?  Another possibility is that there is a cable that has
vibrated into a marginal state.  Probably a long shot, but if it's
easy to get physical access to the machine, and you can afford the
downtime to shut it down, open up the chassis and re-seat the drive
and cables.

Every now and then I have PCIe cards that work fine for years, then
suddenly disappear after a reboot.  I re-seat them and they go back to
being fine for years.  So I believe vibration does sometimes play a
role in mysterious problems that creep up from time to time.



On Mon, Jan 18, 2016 at 5:39 AM, Alessandro Baggi
 wrote:
> Il 18/01/2016 12:09, Chris Murphy ha scritto:
>>
>> What is the result for each drive?
>>
>> smartctl -l scterc 
>>
>>
>> Chris Murphy
>> ___
>> CentOS mailing list
>> CentOS@centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>> .
>>
> SCT Error Recovery Control command not supported
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HDD badblocks

2016-01-17 Thread Matt Garman
Have you ran a "long" smart test on the drive?  Smartctl -t long device

I'm not sure what's going on with your drive. But if it were mine, I'd want
to replace it. If there are issues, that long smart check ought to turn up
something,  and in my experience, that's enough for a manufacturer to do a
warranty replacement.
On Jan 17, 2016 11:00, "Alessandro Baggi" 
wrote:

> Hi list,
> I've a notebook with C7 (1511). This notebook has 2 disk (640 GB) and I've
> configured them with MD at level 1. Some days ago I've noticed some
> critical slowdown while opening applications.
>
> First of all I've disabled acpi on disks.
>
>
> I've checked disk for badblocks 4 consecutive times for disk sda and sdb
> and I've noticed a strange behaviour.
>
> On sdb there are not problem but with sda:
>
> 1) First run badblocks reports 28 badblocks on disk
> 2) Second run badblocks reports 32 badblocks
> 3) Third reports 102 badblocks
> 4) Last run reports 92 badblocks.
>
>
> Running smartctl after the last badblocks check I've noticed that
> Current_Pending_Sector was 32 (not 92 as badblocks found).
>
> To force sector reallocation I've filled the disk up to 100%, runned again
> badblocks and 0 badblocks found.
> Running again smartctl, Current_Pending_Sector 0 but Reallocated_Event
> Count = 0.
>
> Why each consecutive run of badblocks reports different results?
> Why smartctl does not update Reallocated_Event_Count?
> Badblocks found on sda increase/decrease without a clean reason. This
> behaviuor can be related with raid (if a disk had badblocks this badblock
> can be replicated on second disk?)?
>
> What other test I can perform to verify disks problems?
>
> Thanks in advance.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intel SSD

2015-11-18 Thread Matt Garman
I always tell vendors I'm using RHEL, even though we're using CentOS.
If you say CentOS, some vendors immediately throw up their hands and
say "unsupported" and then won't even give you the time of day.

A couple tricks for fooling tools into thinking they are on an actual
RHEL system:
1. Modify /etc/redhat-release to say RedHat Enterprise Linux or
whatever the actual RHEL systems have
2. Similarly modify /etc/issue

Another tip that has proven successful: run the vendor tool under
strace.  Sometimes you can get an idea of what it's trying to do and
why it's failing.  This is exactly what we did to determine why a
vendor tool wouldn't work on CentOS.  We had modified
/etc/redhat-release (as in (1) above), but forgot about /etc/issue.
Strace showed the program existing immediately after an open() call to
/etc/issue.

Good luck!




On Wed, Nov 18, 2015 at 9:24 AM, Michael Hennebry
 wrote:
> On Wed, 18 Nov 2015, Birta Levente wrote:
>
>> I have a supermicro server, motherboard is with C612 chipset and beside
>> that with LSI3108 raid controller integrated.
>> Two Intel SSD DC S3710 200GB.
>> OS: Centos 7.1 up to date.
>>
>> My problem is that the Intel SSD Data Center Tool (ISDCT) does not
>> recognize the SSD drives when they connected to the standard S-ATA ports on
>> the motherboard, but through the LSI raid controller is working.
>>
>> Does somebody know what could be the problem?
>>
>> I talked to the Intel support and they said the problem is that Centos is
>> not supported OS ... only RHEL 7.
>> But if not supported should not work on the LSI controlled neither.
>
>
> Perhaps the tool looks for the string RHEL.
> My recollection is that when IBM PC's were fairly new,
> IBM used that trick with some of its software.
> To work around that, some open source developers used the string "not IBM".
> I think this was pre-internet, so google might not work.
>
> If it's worth the effort, you might make another "CentOS" distribution,
> but call it "not RHEL".
>
> --
> Michael   henne...@web.cs.ndsu.nodak.edu
> "Sorry but your password must contain an uppercase letter, a number,
> a haiku, a gang sign, a heiroglyph, and the blood of a virgin."
>  --  someeecards
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Running Fedora under CentOS via systemd-nspawn?

2015-11-18 Thread Matt Garman
I actually built HandBrake 0.10.2 (the latest) under C7 (using a
CentOS 7 nspawn container so as not to pollute the main system with
the dozens of deps I installed).  Full details here if you're
interested:

http://raw-sewage.net/articles/fedora-under-centos/

The problem with the newer version of HandBrake is that it requires (a
very recent version of) gtk3, which in turn has several other deps
that need to be upgraded on C7.  But I worked through all that, and
can provide all the spec files if anyone wants.

Anyway, the HandBrake problem is solved for me (in possibly multiple ways).

But I'm just fascinated by the possibilities of nspawn, and wondering
how far one can take it before instabilities are introduced.

Consider how many people out there have similar problems as me: want
to run CentOS for stability/reliability/vendor support, but also want
some bleeding-edge software that's only available on Fedora (or Ubuntu
or Arch).  If it's "safe" to run these foreign distributions under
CentOS via nspawn, then I think that's a simple solution.  Virtual
Machines are of course a possible solution, but they seem overkill for
this class of problem.  And not to mention. possibly
inefficient---something like HandBrake should benefit from running on
bare metal, rather than under a virtualized CPU.







On Wed, Nov 18, 2015 at 1:11 PM, Lamar Owen <lo...@pari.edu> wrote:
> On 11/17/2015 12:39 PM, Matt Garman wrote:
>>
>> Now I have a need for a particular piece of software: HandBrake.  I
>> found this site[1] that packages it for both Fedora and CentOS.  But
>> the CentOS version is a little older, as the latest HandBrake requires
>> gtk3.  The latest version is available for Fedora however.
>>
> Hmm, Nux Dextop (li.nux.ro) has HandBrake 0.9.9 for C7, but not yet 0.10.2.
> Nux! is around this list and might be able to shed light on what is needed
> for 0.10.2.
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Running Fedora under CentOS via systemd-nspawn?

2015-11-17 Thread Matt Garman
tl;dr - Is anybody "running" a Fedora system via systemd-nspawn under CentOS?

Long version:

Before CentOS 7, I used chroot to create "lightweight containers"
where I could cleanly add extra repos and/or software without the risk
of "polluting" my main system (and potentially ending up in dependency
hell).  The primary driver for this was MythTV, which has dozens of
deps that span multiple repos.  Without "containing" the MythTV
installation within a chroot environment, I would inevitably lead to
conflicts when doing a yum update.

When I upgraded to CentOS 7, I found out that systemd-nspawn is
"chroot on steroids".  After figuring it all out, I replicated my
MythTV "container", and things were great.

Now I have a need for a particular piece of software: HandBrake.  I
found this site[1] that packages it for both Fedora and CentOS.  But
the CentOS version is a little older, as the latest HandBrake requires
gtk3.  The latest version is available for Fedora however.

So I thought, what if I could "run" Fedora under systemd-nspawn.
Well, I definitely *can* do it.  I copied the base Fedora filesystem
layout off the Live CD, then booted into it via systemd-nspawn.  I was
able to add repos (including the one for HandBrake), and actually
install then run the HandBrake GUI.

So while this does work, I'm wondering if it's safe?  I'm thinking
that at least some of the Fedora tools assume that they are running
under a proper Fedora kernel, whereas in my scheme, they are running
under a CentOS kernel.  I'm sure there have been changes to the kernel
API between the CentOS kernel and the Fedora kernel.  Am I risking
system stability by doing this?

Anyone have any thoughts or experience doing something like this, i.e.
running "foreign" Linux distros under CentOS via systemd-nspawn?  What
if I tried to do this with Debian or Arch or Gentoo?


[1] http://negativo17.org/handbrake/
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Screen

2015-10-30 Thread Matt Garman
If you're just getting starting with a screen multiplexer, I'd suggest
starting with tmux.  My understanding is that GNU screen has
effectively been abandoned.

I used GNU screen for at least 10 years, and recently switched to
tmux.  As someone else said, in GNU screen, if you want to send ctrl-a
to your application (e.g. shell or emacs), you can do ctrl-a followed
by a "naked" a.  I found this becomes so second nature, for the rare
time I'm not in screen/tmux, I habitually do the Ctrl-a a sequence!

tmux's default "action" sequence is Ctrl-b.  Even without my history
of Ctrl-a muscle memory, I think I'd find Ctrl-b awkward.  I briefly
tried to get used to it so I could live without a custom tmux config
file, but just couldn't do it.  So, here's my small ~/.tmux.conf file:


# remap Ctrl-b to Ctrl-a (to emulate behavior of GNU screen)
unbind C-b
set -g prefix C-a
bind C-a send-prefix

# use vi-like keybindings
set-window-option -g mode-keys vi

# emulate GNU screen's Ctrl-a a sequence to jump to beginning of
# line
bind a send-prefix





On Fri, Oct 30, 2015 at 6:39 AM, xaos  wrote:
> Andrew,
>
> Don't do it man. Don't remap screen key sequences.
>
> I had the same issue. This is how I ultimately solved it.
> I mentally trained myself to think of screen
> as a room that I need to do a Ctrl-A in order to get in there.
>
> So, for bash, It is NOT a big deal anyway. Train your fingers to do a
> Ctrl-A then a
>
> It is just one extra keystroke.
>
> I got used to it within a week.
>
> -George
> On 10/30/15 7:13 AM, Scott Robbins wrote:
>>
>> On Fri, Oct 30, 2015 at 10:53:29AM +0100, Andrew Holway wrote:
>>>
>>> Hey
>>>
>>> I like to use Ctrl+A and Ctrl+E a lot to navigate my insane big bash one
>>> liners but this is incompatible with Screen which has a binding to
>>> Ctrl-A.
>>> Is it possible to move the screen binding so I can have the best of both
>>> worlds?
>>
>> If you only make simple use of screen, then there's always tmux.  It uses
>> ctl+b by default, and one of the reasons is the issue you mention.
>>
>> (If you have a lot of complex uses of screen, then it becomes a bigger
>> deal
>> to learn the new keyboard shortcuts, but many people just use it's attach
>> and deteach feature, and relearning those in tmux takes a few minutes.)
>>
>> If you are interested in trying it, I have my own very simple page with
>> links to a better page at http://srobb.net/screentmux.html
>>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6 gcc is a bit old

2015-06-29 Thread Matt Garman
Take a look at Devtoolset, I think this will give you what you want:
https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/



On Mon, Jun 29, 2015 at 1:56 PM, Michael Hennebry
henne...@web.cs.ndsu.nodak.edu wrote:
 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) is a bit old.
 There have been major changes since then.
 I'd like a newer version.

 If I have to, I expect that I can install from source.
 I'd rather not.

 Is there a CentOS 6-compatible repository
 from which I can get a newer version?
 Does a standard CentOS 7 repository have a newer version?
 Does a CentOS 7-compatible repository have a newer version?

 It's my understanding that to compile from source,
 I will need to keep the gcc I have.
 Otherwise I would have nothing to compile the source.
 I expect that providing the right options will let old and new co-exist.
 Is ensuring that I get the right gcc when I type gcc
 just a matter of having the right search path for gcc?
 Will I need to do anything interesting to ensure that
 the resulting executables run using the right libraries?

 I've installed from source before,
 but never to replace an existing compiler.
 My concern is that if I louse things up,
 the mess could be very hard to fix.

 --
 Michael   henne...@web.cs.ndsu.nodak.edu
 SCSI is NOT magic. There are *fundamental technical
 reasons* why it is necessary to sacrifice a young
 goat to your SCSI chain now and then.   --   John Woods
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] managing logins for different classes of servers

2015-06-04 Thread Matt Garman
Our environment has several classes of servers, such as
development, production, qa, utility, etc.  Then we have all
our users.  There's no obvious mapping between users and server class.
Some users may have access to only one class, some may span multiple
classes, etc.  And for maximum complexity, some classes of machines
use local (i.e. /etc/passwd, /etc/shadow) authentication, others use
Kerberos.

With enough users and enough classes, it gets to be more than one can
easily manage with a simple spreadsheet or other crude mechanism.
Plus the ever-growing risk of giving a user access to a class he
shouldn't have.

Is there a simple centralized solution that can simplify the
management of this?  One caveat though is that our production class
machines should not have any external dependencies.  These are
business-critical, so we try to minimize any single point of failure
(e.g. a central server).  Plus the production class machines are
distributed in multiple remote locations.

Any thoughts?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

2015-05-04 Thread Matt Garman
On Thu, Apr 30, 2015 at 7:31 AM, Peter van Hooft
ho...@natlab.research.philips.com wrote:
 You may want to try reducing sunrpc.tcp_max_slot_table_entries .
 In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries = 16
 In CentOS 6, this number is dynamic with a maximum of
 sunrpc.tcp_max_slot_table_entries which by default has a value of 65536.

 We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc
 tcp_max_slot_table_entries=128

 Make that /etc/modprobe.d/sunrpc.conf, of course.


This appears to be the smoking gun we were looking for, or at least
a significant piece of the puzzle.

We actually tried this early on in our investigation, but were
changing it via sysctl, which apparently has no effect.  Your email
convinced me to try again, but this time configuring the parameters
via modprobe.

In our case, 128 was still too high.  So we dropped it all the way
down to 16.  Our understanding is that 16 is the CentOS 5 value.  What
we're seeing is now our apps are starved for data, so looks like we
might have to nudge it up.  In other words, there's either something
else at play which we're not aware of, or the meaning of that
parameter is different between CentOS 5 and CentOS 6.

Anyway, thank you very much for the suggestion.  You turned on the
light at the end of the tunnel!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

2015-04-29 Thread Matt Garman
On Wed, Apr 29, 2015 at 10:51 AM,  m.r...@5-cent.us wrote:
 The server in this case isn't a Linux box with an ext4 file system - so
 that won't help ...

 What kind of filesystem is it? I note that xfs also has barrier as a mount
 option.

The server is a NetApp FAS6280.  It's using NetApp's filesystem.  I'm
almost certain it's none of the common Linux ones.  (I think they call
it WAFL IIRC.)

Either way, we do the NFS mount read-only, so write barriers don't
even come into play.  E.g., with your original example, if we unzipped
something, we'd have to write to the local disk.

Furthermore, in low load situations, the NetApp read latency stays
low, and the 5/6 performance is fairly similar.  It's only when the
workload gets high, and it turn this aggressive demand is placed on
the NetApp, that we in turn see overall decreased performance.

Thanks for the thoughts!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

2015-04-29 Thread Matt Garman
We have a compute cluster of about 100 machines that do a read-only
NFS mount to a big NAS filer (a NetApp FAS6280).  The jobs running on
these boxes are analysis/simulation jobs that constantly read data off
the NAS.

We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5.
We did a piecemeal upgrade, usually upgrading five or so machines at
a time, every few days.  We noticed improved performance on the CentOS
6 boxes.  But as the number of CentOS 6 boxes increased, we actually
saw performance on the CentOS 5 boxes decrease.  By the time we had
only a few CentOS 5 boxes left, they were performing so badly as to be
effectively worthless.

What we observed in parallel to this upgrade process was that the read
latency on our NetApp device skyrocketed.  This in turn caused all
compute jobs to actually run slower, as it seemed to move the
bottleneck from the client servers' OS to the NetApp.  This is
somewhat counter-intuitive: CentOS 6 performs faster, but actually
results in net performance loss because it creates a bottleneck on our
centralized storage.

All indications are that CentOS 6 seems to be much more aggressive
in how it does NFS reads.  And likewise, CentOS 5 was very polite,
to the point that it basically got starved out by the introduction of
the 6.5 boxes.

What I'm looking for is a deep dive list of changes to the NFS
implementation between CentOS 5 and CentOS 6.  Or maybe this is due to
a change in the TCP stack?  Or maybe the scheduler?  We've tried a lot
of sysctl tcp tunings, various nfs mount options, anything that's
obviously different between 5 and 6... But so far we've been unable to
find the smoking gun that causes the obvious behavior change between
the two OS versions.

Just hoping that maybe someone else out there has seen something like
this, or can point me to some detailed documentation that might clue
me in on what to look for next.

Thanks!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

2015-04-29 Thread Matt Garman
On Wed, Apr 29, 2015 at 10:36 AM, Devin Reade g...@gno.org wrote:
 Have you looked at the client-side NFS cache?  Perhaps the C6 cache
 is either disabled, has fewer resources, or is invalidating faster?
 (I don't think that would explain the C5 starvation, though, unless
 it's a secondary effect from retransmits, etc.)

Do you know where the NFS cache settings are specified?  I've looked
at the various nfs mount options.  Anything cache-related appears to
be the same between the two OSes, assuming I didn't miss anything.  We
did experiment with the noac mount option, though that had no effect
in our tests.

FWIW, we've done a tcpdump on both OSes, performing the same tasks,
and it appears that 5 actually has more chatter.  Just looking at
packet counts, 5 has about 17% more packets than 6, for the same
workload.  I haven't dug too deep into the tcpdump files, since we
need a pretty big workload to trigger the measurable performance
discrepancy.  So the resulting pcap files are on the order of 5 GB.

 Regarding the cache, do you have multiple mount points on a client
 that resolve to the same server filesystem?  If so, do they have
 different mount options?  If so, that can result in multiple caches
 instead of a single disk cache.  The client cache can also be bypassed
 if your application is doing direct I/O on the files.  Perhaps there
 is a difference in the application between C5 and C6, including
 whether or not it was just recompiled?  (If so, can you try a C5 version
 on the C6 machines?)

No multiple mount points to the same server.

No application differences.  We're still compiling on 5, regardless of
target platform.

 If you determine that C6 is doing aggressive caching, does this match
 the needs of your application?  That is, do you have the situation
 where the client NFS layer does an aggressive read-ahead that is never
 used by the application?

That was one of our early theories.  On 6, you can adjust this via
/sys/class/bdi/X:Y/read_ahead_kb (use stat on the mountpoint to
determine X and Y).  This file doesn't exist on 5.  But we tried
increasing and decreasing it from the default (960), and didn't see
any changes.

 Are C5 and C6 using the same NFS protocol version?  How about TCP vs
 UDP?  If UDP is in play, have a look at fragmentation stats under load.

Yup, both are using tcp, protocol version 3.

 Are both using the same authentication method (ie: maybe just
 UID-based)?

Yup, sec=sys.

 And, like always, is DNS sane for all your clients and servers?  Everything
 (including clients) has proper PTR records, consistent with A records,
 et al?  DNS is so fundamental to everything that if it is out of whack
 you can get far-reaching symptoms that don't seem to have anything to do
 with DNS.

I believe so.  I wouldn't bet my life on it.  But there were certainly
no changes to our DNS before, during or since the OS upgrade.

 You may want to look at NFSometer and see if it can help.

Haven't seen that, will definitely give it a try!

Thanks for your thoughts and suggestions!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 7 NFS client problems

2015-04-24 Thread Matt Garman
What does your /etc/idmapd.conf look like on the server side?

I fought with this quite a bit a while ago, but my use case was a bit
different, and I was working with CentOS 5 and 6.

Still, the kicker for me was updating the [Translation] section of
/etc/idmapd.conf.  Mine looks like this:

[Translation]
Method = nsswitch
GSS-Methods = nsswitch,static

You said you're not using Kerberos or LDAP, so I'm guessing you can
leave out the GSS-Methods line entirely, and make your Method line
nsswitch,static.

Furthermore, in my /etc/idmapd.conf file, I have a [Static] section
which, according to my comments, maps GSS-authenticated names to local
user names.  So mine looks kind of like this:

[Static]
someuser@REALM = localuser

Again, since you're not using GSS, I'm not sure if you can get away
with something like

[Static]
joe = joe

But it's probably worth trying/experimenting.

I hope that can be of some help!





On Thu, Apr 23, 2015 at 3:11 PM, Devin Reade g...@gno.org wrote:
 #define TL;DR

 Despite idmapd running, usernames/IDs don't get mapped properly.
 Looking for a workaround.

 #undef TL;DR

 I'm trying to get a new CentOS 7.1 workstation running, and having
 some problems with NFS filesystems.  The server is a fully patched
 CentOS 6 server.

 On the NFS filesystem, there are two subdirectories owned by a
 regular user (joe). (There are actually more and by multiple users, but
 I'll just show the two.) That user exists on both the NFS server and this
 CentOS 7 NFS client.  However, the user on the client machine is unable
 to perform various operations.  (The operations work when logged into
 the server.)

 $ whoami
 joe
 $ cd /nfs
 $ ls -l
 drwx--. 6 joejoe 4096 Apr 23 11:20 one
 drwxr-xr-x. 4 joejoe 4096 Dec 14  2011 two
 $ cd one
 one: Permission denied.
 $ cd two
 $ ls
 subdir1 subdir2
 $ touch testfile
 touch: cannot touch testfile: Permission denied

 mount(1) shows that the filesystem is mounted rw.  The server has it
 exported rw to the entire subnet.  Other machines (CentOS 5) mount
 the same filesystems without a problem.

 Looks a lot like an idmapd issue, right?

 On the server:
 # id joe
 uid=501(joe) gid=501(joe) groups=501(joe)

 Back on the client:

 $ ps auxww | grep idmap | grep -v grep
 $ id joe
 uid=1000(joe) gid=1000(joe) groups=1000(joe)
 $ cd /nfs
 $ ls -n
 drwx--. 6 1000 1000  4096 Apr 23 11:20 one
 drwxr-xr-x. 4 1000 1000  4096 Dec 14  2011 two

 So it looks like even though the name/UID mapping is correct even though
 the idmapd daemon isn't running on the client.  (It looks like CentOS7
 only starts idmapd when it's running an NFS *server*.)

 # systemctl list-units | grep nfs
 nfs.mountloaded active mounted   /nfs
 proc-fs-nfsd.mount   loaded active mounted   NFSD configuration
 filesystem
 var-lib-nfs-rpc_pipefs.mount loaded active mounted   RPC Pipe File System
 nfs-config.service   loaded active exitedPreprocess NFS
 configuration
 nfs-client.targetloaded active activeNFS client services

 The behavior was tested again with SELinux in permissive mode; no change.

 Splunking a bit more shows some similar behavior for other distros:
  https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/966734
  https://bugzilla.linux-nfs.org/show_bug.cgi?id=226

 Yep, this is a situation where LDAP and Kerberos aren't in play. And
 the CentOS 5, CentOS 6, and other UNIXen boxes are using consistent
 UID/GID mappings.  However, CentOS7 (well, RHEL7) changed the minimum
 UID/GID for regular accounts, so when the account was created on the
 latter, the UID is out of sync.  So much for idmapd (without the
 fixes involved in the above URLs).

 Has anyone else run into this and have a solution other than forcing
 UIDs to match?

 Devin

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] centos kernel changelog?

2015-04-09 Thread Matt Garman
I'm probably overlooking something simple, but I can't seem to find a
concise changelog for the rhel/centos kernel.  I'm on an oldish 6.5
kernel (2.6.32-431), and I want to look at the changes and fixes for
every kernel that has been released since, all the way up to the
current 6.6 kernel.

Anyone have a link to this?

Thanks!
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] centos kernel changelog?

2015-04-09 Thread Matt Garman
On Thu, Apr 9, 2015 at 8:49 AM, Johnny Hughes joh...@centos.org wrote:
 rpm -qp --changelog rpm-name | less

 NOTE:  This works for any kernel RPM in any version of CentOS ... you
 can download the latest 6 RPM from here:

 http://mirror.centos.org/centos/6/updates/x86_64/Packages/

 (currently kernel-2.6.32-504.12.2.el6.x86_64.rpm)


Thank you Johnny, that was exactly what I needed, and immensely helpful!

One more quick question: what does the number in brackets at the end
of most lines represent?  For example:

- [fs] nfs: Close another NFSv4 recovery race (Steve Dickson) [1093922]

What does the 1093922 mean?

Thanks again!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS-virt] virsh list hangs / guests not starting automatically

2014-10-14 Thread Matt Garman
I followed the wiki[1] to create a KVM virtual machine using bridged
network on CentOS 6.5.  It seemed to work fine on initial setup.
(FWIW I'm trying to run a MythBuntu guest.) However, after a reboot,
it doesn't auto-start the VMs.

Shortly after boot, if I go into virsh, then do a list, it just
hangs.  Likewise, if I go into virt manager, it just hangs with the
message connecting.

Kernel version is: 2.6.32-431.29.2.el6.x86_64

Relevant package versions:

libvirt.x86_64   0.10.2-29.el6_5.12
libvirt-client.x86_640.10.2-29.el6_5.12
libvirt-python.x86_640.10.2-29.el6_5.12
python-virtinst.noarch   0.600.0-18.el6
virt-manager.x86_64  0.9.0-19.el6
virt-top.x86_64  1.0.4-3.15.el6
virt-viewer.x86_64   0.5.6-8.el6_5.3
qemu-img.x86_64  2:0.12.1.2-2.415.el6_5.14
qemu-kvm.x86_64  2:0.12.1.2-2.415.el6_5.14

CPU is a Xeon E3-1230v3.  I have the virtualization setting enabled in the BIOS.

I googled on this, and saw a bunch of talk about two years ago
regarding issues with the libvirt packages having a deadlock bug.  But
I think the versions of the relevant packages that I have installed
are new enough to have fixes for that.

I also happened across an earlier post to this list[2], where it
seemed someone was having a similar problem.  I was previously
attempting to use balance-rr and 802.3ad bonding modes on my host.
However, I just changed to using active-backup and the problem
remains.

I have in /etc/libvirt/libvirtd.conf the following three lines (the
rest is stock, i.e. all comments):
log_level = 2
log_filters=
log_outputs=1:file:/var/log/libvirt/libvirt.log

Below I posted the contents of the libvirt log file after doing a
service libvirt start.

Anyone ever fought this before?

Thanks!

[1] http://wiki.centos.org/HowTos/KVM
[2] http://lists.centos.org/pipermail/centos-virt/2014-March/003722.html

/var/log/libvirt/libvirt.log output:
2014-10-14 16:47:11.150+: 4657: info : libvirt version: 0.10.2,
package: 29.el6_5.12 (CentOS BuildSystem http://bugs.centos.org,
2014-09-01-13:44:02, c6b8.bsys.dev.centos.org)
2014-10-14 16:47:11.150+: 4657: info :
virNetlinkEventServiceStart:517 : starting netlink event service with
protocol 0
2014-10-14 16:47:11.151+: 4657: info :
virNetlinkEventServiceStart:517 : starting netlink event service with
protocol 15
2014-10-14 16:47:11.154+: 4668: info :
dnsmasqCapsSetFromBuffer:667 : dnsmasq version is 2.48, --bind-dynamic
is NOT present, SO_BINDTODEVICE is NOT in use
2014-10-14 16:47:11.157+: 4668: info :
networkReloadIptablesRules:1925 : Reloading iptables rules
2014-10-14 16:47:11.157+: 4668: info : networkRefreshDaemons:1287
: Refreshing network daemons
2014-10-14 16:47:11.278+: 4668: info : networkStartNetwork:2422 :
Starting up network 'default'
2014-10-14 16:47:11.290+: 4668: info :
virStorageBackendVolOpenCheckMode:1085 : Skipping special dir '.'
2014-10-14 16:47:11.290+: 4668: info :
virStorageBackendVolOpenCheckMode:1085 : Skipping special dir '..'
2014-10-14 16:47:11.352+: 4668: info : qemudStartup:754 : Unable
to create cgroup for driver: No such device or address
2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 :
Configured cgroup controller 'cpu'
2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 :
Configured cgroup controller 'cpuacct'
2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 :
Configured cgroup controller 'cpuset'
2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 :
Configured cgroup controller 'memory'
2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 :
Configured cgroup controller 'devices'
2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 :
Configured cgroup controller 'blkio'
2014-10-14 16:47:11.509+: 4668: info :
virDomainLoadAllConfigs:14696 : Scanning for configs in
/var/run/libvirt/qemu
2014-10-14 16:47:11.527+: 4668: info :
virDomainLoadAllConfigs:14696 : Scanning for configs in
/etc/libvirt/qemu
2014-10-14 16:47:11.527+: 4668: info :
virDomainLoadAllConfigs:14718 : Loading config file 'mythbuntu.xml'
2014-10-14 16:47:11.529+: 4668: info : qemuDomainSnapshotLoad:484
: Scanning for snapshots for domain mythbuntu in
/var/lib/libvirt/qemu/snapshot/mythbuntu
___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] virsh list hangs / guests not starting automatically

2014-10-14 Thread Matt Garman
I just wanted to follow-up to add that eventually, the virtual machine
did start, and now virsh list works as expected.  But it took nearly
30 minutes.  The updated libvirt.log is shown below.  Notice the huge
jump in time, from 16:47 to 17:14.  (Side question: it appears the
timestamps are UTC, rather than my local time, any way to address
that?)


2014-10-14 16:47:11.527+: 4668: info :
virDomainLoadAllConfigs:14718 : Loading config file 'mythbuntu.xml'
2014-10-14 16:47:11.529+: 4668: info : qemuDomainSnapshotLoad:484
: Scanning for snapshots for domain mythbuntu in
/var/lib/libvirt/qemu/snapshot/mythbuntu
2014-10-14 17:14:41.751+: 4668: info : virNetDevProbeVnetHdr:94 :
Enabling IFF_VNET_HDR
2014-10-14 17:14:41.805+: 4668: info :
virSecurityDACSetOwnership:296 : Setting DAC user and group on
'/home/kvm/mythbuntu.img' to '107:107'
2014-10-14 17:14:41.806+: 4668: info :
virSecurityDACSetOwnership:296 : Setting DAC user and group on
'/mnt/mythtv1/mythbackend_recordings' to '107:107'
2014-10-14 17:14:42.084+: 4668: info : lxcSecurityInit:1380 :
lxcSecurityInit (null)
2014-10-14 17:14:42.084+: 4668: info :
virDomainLoadAllConfigs:14696 : Scanning for configs in
/var/run/libvirt/lxc
2014-10-14 17:14:42.084+: 4668: info :
virDomainLoadAllConfigs:14696 : Scanning for configs in
/etc/libvirt/lxc
2014-10-14 17:14:42.089+: 4659: error : virFileReadAll:462 :
Failed to open file '/proc/4836/stat': No such file or directory
2014-10-14 17:14:42.090+: 4660: error : virFileReadAll:462 :
Failed to open file '/proc/8017/stat': No such file or directory
2014-10-14 17:26:34.679+: 4661: info : remoteDispatchAuthList:2398
: Bypass polkit auth for privileged client pid:11343,uid:0




On Tue, Oct 14, 2014 at 12:08 PM, Matt Garman matthew.gar...@gmail.com wrote:
 I followed the wiki[1] to create a KVM virtual machine using bridged
 network on CentOS 6.5.  It seemed to work fine on initial setup.
 (FWIW I'm trying to run a MythBuntu guest.) However, after a reboot,
 it doesn't auto-start the VMs.

 Shortly after boot, if I go into virsh, then do a list, it just
 hangs.  Likewise, if I go into virt manager, it just hangs with the
 message connecting.

 Kernel version is: 2.6.32-431.29.2.el6.x86_64

 Relevant package versions:

 libvirt.x86_64   0.10.2-29.el6_5.12
 libvirt-client.x86_640.10.2-29.el6_5.12
 libvirt-python.x86_640.10.2-29.el6_5.12
 python-virtinst.noarch   0.600.0-18.el6
 virt-manager.x86_64  0.9.0-19.el6
 virt-top.x86_64  1.0.4-3.15.el6
 virt-viewer.x86_64   0.5.6-8.el6_5.3
 qemu-img.x86_64  2:0.12.1.2-2.415.el6_5.14
 qemu-kvm.x86_64  2:0.12.1.2-2.415.el6_5.14

 CPU is a Xeon E3-1230v3.  I have the virtualization setting enabled in the 
 BIOS.

 I googled on this, and saw a bunch of talk about two years ago
 regarding issues with the libvirt packages having a deadlock bug.  But
 I think the versions of the relevant packages that I have installed
 are new enough to have fixes for that.

 I also happened across an earlier post to this list[2], where it
 seemed someone was having a similar problem.  I was previously
 attempting to use balance-rr and 802.3ad bonding modes on my host.
 However, I just changed to using active-backup and the problem
 remains.

 I have in /etc/libvirt/libvirtd.conf the following three lines (the
 rest is stock, i.e. all comments):
 log_level = 2
 log_filters=
 log_outputs=1:file:/var/log/libvirt/libvirt.log

 Below I posted the contents of the libvirt log file after doing a
 service libvirt start.

 Anyone ever fought this before?

 Thanks!

 [1] http://wiki.centos.org/HowTos/KVM
 [2] http://lists.centos.org/pipermail/centos-virt/2014-March/003722.html

 /var/log/libvirt/libvirt.log output:
 2014-10-14 16:47:11.150+: 4657: info : libvirt version: 0.10.2,
 package: 29.el6_5.12 (CentOS BuildSystem http://bugs.centos.org,
 2014-09-01-13:44:02, c6b8.bsys.dev.centos.org)
 2014-10-14 16:47:11.150+: 4657: info :
 virNetlinkEventServiceStart:517 : starting netlink event service with
 protocol 0
 2014-10-14 16:47:11.151+: 4657: info :
 virNetlinkEventServiceStart:517 : starting netlink event service with
 protocol 15
 2014-10-14 16:47:11.154+: 4668: info :
 dnsmasqCapsSetFromBuffer:667 : dnsmasq version is 2.48, --bind-dynamic
 is NOT present, SO_BINDTODEVICE is NOT in use
 2014-10-14 16:47:11.157+: 4668: info :
 networkReloadIptablesRules:1925 : Reloading iptables rules
 2014-10-14 16:47:11.157+: 4668: info : networkRefreshDaemons:1287
 : Refreshing network daemons
 2014-10-14 16:47:11.278+: 4668: info : networkStartNetwork:2422 :
 Starting up network 'default'
 2014-10-14 16:47:11.290+: 4668: info :
 virStorageBackendVolOpenCheckMode:1085 : Skipping special dir '.'
 2014-10-14 16:47:11.290+: 4668: info :
 virStorageBackendVolOpenCheckMode:1085 : Skipping special dir

Re: [CentOS] centos 6.5 input lag

2014-10-14 Thread Matt Garman
Update on this problem:

From another system, I initiated a constant ping on my laggy server.
I noticed that every 10--20 seconds, one or more ICMP packets would
drop.  These drops were consistent with the input lag I was
experiencing.

I did a web search for linux periodically hangs and found this
Serverfault post that had a lot in common with my symptoms:


http://serverfault.com/questions/371666/linux-bonded-interfaces-hanging-periodically

I in fact have bonded interfaces on the laggy server.  When I checked
the bonding config, I realized a while ago I had changed from
balance-rr / mode 0, to 802.3ad / mode 4.  (I did this because I kept
getting bond0: received packet with own address as source address
when using balance-rr with a bridge interface.  The bridge interface
was for using KVM.)

For now, I simply disabled one of the slave interfaces, and the lag /
dropped ICMP packets problem has gone away.

Like the Serverfault poster, I have an HP TrueCurve 1800-24g switch.
The switch is supposed to support 802.3ad link aggregation.  It's not
a managed switch, so I (perhaps incorrectly) assumed that 802.3ad
would magically just work.  Either there is more required to make it
work, or it's implementation is broken.  Curiously, however, running
my bond0 in 802.3ad mode did work without any issue for over a month.

Anyway, hopefully this might help someone else struggling with a
similar problem.




On Fri, Oct 10, 2014 at 4:17 PM, Matt Garman matthew.gar...@gmail.com wrote:
 On Fri, Oct 10, 2014 at 4:11 PM, Joseph L. Brunner
 j...@affirmedsystems.com wrote:
 If this is a server - is it possible your raid card battery died?

 It is a server, but a home file server.  The raid card has no battery
 backup, and in fact has been flashed to pure HBA mode.  Actual
 RAID'ing is done at the software level.

 The only other thing on the hardware side that comes to mind is actual bad 
 sectors if this is not a raided virtual drive.

 The system has eight total drives: two SSDs in raid-1 for the OS, five
 3.5 spinning drives in RAID-6, and a single 3.5 drive normally used
 for mythtv recordings (though mythtv has been stopped for a long time
 now to try to debug the issue).

 From the OS side can you keep the box up long enough to do a yum update?

 Yes, I updated everything except packages beginning with l (el /
 lowercase 'L') due to that generating a number of conflicts that I
 haven't have time to resolve.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] centos 6.5 input lag

2014-10-10 Thread Matt Garman
On Thu, Oct 9, 2014 at 11:20 PM, Joseph L. Brunner
j...@affirmedsystems.com wrote:
 Is it under some type of ddos attack?

 What's running on this machine? In front of it?

A DDOS attack seems unlikely, though I suppose it's possible.  Sitting
between the lagging machine and the Internet is a pfSense box.  All
the other machines in the house have no issues, and they all route
through the pfSense system.

Right now, the only stuff running on it:

- CrashPlan (java backup application)
- Munin
- Apache (only for Munin, no external access [i.e. no port forwarding
from pfSense])
- mpd (music player daemon)

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] centos 6.5 input lag

2014-10-10 Thread Matt Garman
On Fri, Oct 10, 2014 at 4:11 PM, Joseph L. Brunner
j...@affirmedsystems.com wrote:
 If this is a server - is it possible your raid card battery died?

It is a server, but a home file server.  The raid card has no battery
backup, and in fact has been flashed to pure HBA mode.  Actual
RAID'ing is done at the software level.

 The only other thing on the hardware side that comes to mind is actual bad 
 sectors if this is not a raided virtual drive.

The system has eight total drives: two SSDs in raid-1 for the OS, five
3.5 spinning drives in RAID-6, and a single 3.5 drive normally used
for mythtv recordings (though mythtv has been stopped for a long time
now to try to debug the issue).

 From the OS side can you keep the box up long enough to do a yum update?

Yes, I updated everything except packages beginning with l (el /
lowercase 'L') due to that generating a number of conflicts that I
haven't have time to resolve.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] centos 6.5 input lag

2014-10-09 Thread Matt Garman
I have a CentOS 6.5 x86_64 system that's been running problem-free for
quite a while.

Recently, it's locked-up hard several times.  It's a headless server,
but I do have IP KVM.  However, when it's locked up, all I can see are
a few lines of kernel stack trace.  No hints to the problem in the
system logs.  I even enabled remote logging of syslog, hoping to catch
the errors that way.  No luck.

I ran memtest86+ for about 36 errors, no problems.

I've tried to strip away just about all running services.  It's just a
home file server.  I haven't had a crash in a while, but I also
haven't had it running very long.

But even while it's up, I have severe input lag in the shell.  I'll
type a few characters, and two to 10 or so seconds pass before
anything echoes to the screen.

I've checked top, practically zero CPU load.

It's not swapping - 16 GB of RAM, 0 swap used.  Most memory heavy
process is java (for CrashPlan backups).

iostat shows 0% disk utilization.

Anyone seen anything like this?  Where else can I check to try to
determine the source of this lag (which I suspect might be related to
the recent crashes)?

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] virsh list hangs

2014-10-08 Thread Matt Garman
I followed the wiki[1] to create a KVM virtual machine using bridged
network on CentOS 6.5.  It seemed to work fine on initial setup.
However, after a boot, it doesn't auto-start the VMs, or at least,
something has to timeout (a *very* long time, on the order of 15--30
minutes) before they can be started.

Shortly after boot, if I go into virsh, then do a list, it just
hangs.  Likewise, if I go into virt-manager, it just hangs
connecting.

Kernel version is: 2.6.32-431.29.2.el6.x86_64

Relevant package versions:

libvirt.x86_64   0.10.2-29.el6_5.12
libvirt-client.x86_640.10.2-29.el6_5.12
libvirt-python.x86_640.10.2-29.el6_5.12
python-virtinst.noarch   0.600.0-18.el6
virt-manager.x86_64  0.9.0-19.el6
virt-top.x86_64  1.0.4-3.15.el6
virt-viewer.x86_64   0.5.6-8.el6_5.3
qemu-img.x86_64  2:0.12.1.2-2.415.el6_5.14
qemu-kvm.x86_64  2:0.12.1.2-2.415.el6_5.14

CPU is a Xeon E3-1230v3.  I have the virtualization setting enabled in the BIOS.

I googled on this, and saw a bunch of talk about two years ago
regarding issues with the libvirt packages having a deadlock bug.  But
I think the versions of the relevant packages that I have installed
are new enough to have fixes for that.

Anyone ever fought this before?

Thanks!

[1] http://wiki.centos.org/HowTos/KVM
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lost packets - Bond

2014-09-18 Thread Matt Garman
On Wed, Sep 17, 2014 at 11:28 AM, Eduardo Augusto Pinto
edua...@eapx.com.br wrote:
 I'm using in my bond interfaces as active backup, in theory, should assume an
 interface (or work) only when another interface is down.

 But I'm just lost packets on the interface that is not being used and is 
 generating
 packet loss on bond.

My suspicion is that the bonding may be irrelevant here.  You can drop
packets with our without bonding.

There are many reasons why packets can be dropped, but one common one
is a too-slow consumer of those packets.  For example, say you are
trying to watch a streaming ultra-high-definition video on a system
with low memory and a slow CPU: the kernel can only buffer so many
packets before it has to start dropping them.

It's hard to suggest a solution without knowing the exact cause.  But
one thing to try (as much for debugging as an actual solution) is to
increase your buffer sizes.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] grubby fatal error: unable to find a suitable template

2014-06-25 Thread Matt Garman
I did a bulk yum update -y of several servers.  As a sanity check
after the upgrade, I ran a grep of /etc/grub.conf across all updated
servers looking to ensure the kernel I expected was installed.

Two servers came up saying /etc/grub.conf did not exist!

I logged into the servers, and /etc/grub.conf was a broken link.  (It
points to /boot/grub/grub.conf).  My systems are all setup with a
dedicated /boot partition.  Sure enough, /boot was not mounted.
Furthermore, I saw no /boot entry in /etc/fstab (which all my other
servers contain).

So I mounted /boot, and the the grub.conf file was not consistent: it
did not have a stanza for the kernel I wanted installed.  So I did a
yum remove kernel ; yum install -y kernel.  Both the remove and the
install resulted in this message getting printed:

grubby fatal error: unable to find a suitable template

Just for kicks, I renamed both the /etc/grub.conf symlink as well as
the actual /boot/grub/grub.conf file, and repeated the kernel
remove/install.  This did NOT produce the above error; however, no
symlink or actual grub.conf file was created.

I did a little web searching on the above error, and one common cause
is that there is no valid title... stanza in the grub.conf file for
grubby to use as a template.  But my file does in fact contain a valid
stanza.

I even copied a valid grub.conf file from another server, and re-ran
the kernel remove/install: same error.

Clearly, something is broken, but I'm not sure what.  Anyone seen
anything like this?

By the way, these machines were all 5.something, being upgraded to 5.7.

Thanks!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] grubby fatal error: unable to find a suitable template

2014-06-25 Thread Matt Garman
On Wed, Jun 25, 2014 at 4:47 PM,  m.r...@5-cent.us wrote:
 ? Why not to 5.10, the current release of CentOS 5.x?

Off topic for the question, but, briefly, changing *anything* in our
environment involves extensive testing and validation due to very
precise performance requirements (HFT, microsecond changes make or
break us).  For our particular application, we've seen significant
performance changes with minor kernel revisions.  We've been putting
this testing and validation effort into CentOS 6.5, and will hopefully
be moving to off 5.x completely before too long.  But in the
short-term, 5.7 it is for us.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] copying user accounts...

2014-06-10 Thread Matt Garman
I've used usermod -p encrypted password username successfully many times.

Just be careful with escaping of the '$' field separators that appear
in the encrypted password string from /etc/shadow.



On Tue, Jun 10, 2014 at 4:28 PM, John R Pierce pie...@hogranch.com wrote:
 I want to copy a few user accounts to a new system...   is there a more
 elegant way to copy /etc/shadow passwords other than editing the file?
 for instance, is there some way I can give the password hash to
 /usr/bin/passwd ?




 --
 john r pierce  37N 122W
 somewhere on the middle of the left coast

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Mother board recommendation

2014-05-16 Thread Matt Garman
On Fri, May 16, 2014 at 7:21 AM, Joseph Hesse joehe...@gmail.com wrote:
 I want to build a lightweight server and install centos.  Does anyone
 have a recommendation for a suitable motherboard?

What will the role of the server be?  How lightweight?  How many
users, what kinds of services, what (if any) performance requirements,
etc?  Room for future growth/expansion?

Budget?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load average, low CPU utilization

2014-03-28 Thread Matt Garman
On Fri, Mar 28, 2014 at 9:01 AM, Mr Queue li...@mrqueue.com wrote:

 On Thu, 27 Mar 2014 17:20:22 -0500
 Matt Garman matthew.gar...@gmail.com wrote:

  Anyone seen anything like this?  Any thoughts or ideas?

 Post some data.. This public facing? Are you getting sprayed down by
 packets? Array? Soft/hard? Someone have screens
 laying around? Write a trap to catch a process list when the loads spike?
 Look at crontab(s)? User accounts? Malicious
 shells? Any guest containers around? Possibilities are sort of endless
 here.



Not public facing (no Internet access at all).  Linux software RAID-1. No
screen or tmux data.  No guest access of any kind.  In fact, only three
logged in users.

I've reviewed crontabs (there are only a couple), and I don't see anything
out of the ordinary.  Malicious shells or programs: possibly, but I think
that is highly unlikely... if someone were going to do something malicious,
*this* particular server is not the one to target.

What kind of data would help?  I have sar running at a five second
interval.  I also did a 24-hour run of dstat at a one second interval
collecting all information it could.  I have tons of data, but not sure how
to distill it down to a mailing-list friendly format.  But a colleague
and I reviewed the data, and don't see any correlation with other system
data before, during, or after these load spike events.

I did a little research on the loadavg number, and my understanding is that
it's simply a function of the number of tasks on the system.  (There's some
fancy stuff thrown in for exponential decay and curve smoothing and all
that, but it's still based on the number of system tasks.)

I did a simple run of top -b  top_output.txt for a 24-hour period, which
captured another one of these events.  I haven't had a chance to study it
in detail, but I expected the number of tasks to shoot up dramatically
around the time of these load spikes.  The number of tasks remained fairly
constant: about 200 +/- 5.

How can the loadavg shoot up (from ~1 to ~20) without a corresponding
uptick in number of tasks?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load average, low CPU utilization

2014-03-28 Thread Matt Garman
On Fri, Mar 28, 2014 at 10:30 AM, John Doe jd...@yahoo.com wrote:
 Any USB device?
 Each time I access USB disks, load goes through the roof.

Nope, it's a rack server in a secure remote location, with no
peripherals at all attached.  Only attached cables are power and
network.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load average, low CPU utilization

2014-03-28 Thread Matt Garman
On Fri, Mar 28, 2014 at 9:37 AM, John R. Dennison j...@gerdesas.com wrote:

 On Fri, Mar 28, 2014 at 09:30:17AM -0500, Matt Garman wrote:
 
  How can the loadavg shoot up (from ~1 to ~20) without a corresponding
  uptick in number of tasks?

 loadavg is based on number of processes vying for cpu time on the runq; the
 number of over-all processes on the system is not really relevant unless
 they are all competing for cpu.


Is there a way to see this number of processes in the runq?  From the
shell or programmatically?


 What's the i/o wait on the box when you see load spikes?  If the box is
 i/o bound (indicated by high i/o) the load average will spike due to
 processes blocked on i/o cycles.

I ran top -b directed to a file and captured one of these spikes.
Here's a sample from the approximate start, peak, and end of the load
spike (respectively):

top - 18:40:29 up 14 days,  1:34, 4 users,  load average: 0.80, 0.48, 0.29
Tasks: 205 total,   1 running, 204 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  4.9%sy,  0.0%ni, 92.1%id,  0.0%wa,  0.1%hi,  1.7%si,  0.0%st

top - 19:16:00 up 14 days,  2:09, 4 users,  load average: 19.67, 19.02, 15.75
Tasks: 203 total,   1 running, 202 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.1%us,  4.6%sy,  0.0%ni, 92.3%id,  0.0%wa,  0.2%hi,  1.9%si,  0.0%st

 top - 20:20:27 up 14 days,  3:14, 4 users,  load average: 0.93, 3.58, 8.69
Tasks: 212 total,   1 running, 211 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  4.8%sy,  0.0%ni, 91.7%id,  0.6%wa,  0.1%hi,  1.6%si,  0.0%st

Looks like I collected 17277 total top samples.  The max %wa over
this time was 61.1%, and less than 40 of those samples had %wa over
10.0.  In other words, over many hours, the system had IOwait over 10%
for less than a minute.  And note that my load spike lasts for almost
two hours.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] High load average, low CPU utilization

2014-03-27 Thread Matt Garman
I have a dual Xeon 5130 (four total CPUs) server running CentOS 5.7.
 Approximately every 17 hours, the load on this server slowly creeps up
until it hits 20, then slowly goes back down.

The most recent example started around 2:00am this morning.  Outside of
these weird times, the load never exceeds 2.0 (and in fact spends the
overwhelming majority of its time at 1.0).  So this morning, a few data
points:

- 2:06 to 2:07 load increased from 1.0 to 2.0
- At 2:09 it hit 4.0
- At 2:10 it hit 5.34
- At 2:16 it hit 10.02
- At 2:17 it hit 11.0
- At 2:24 it hit 17.0
- At 2:27 it hit 19.0 and stayed here +/1 1.0 until
- At 2:48 it was 18.96 and looks like it started to go down (very
slowly)
- At 2:57 it was 17.84
- At 3:05 it was 16.76
- At 3:16 it was 15.03
- At 3:27 it was 9.3
- At 3:39 it was 4.08
- At 3:44 it was 1.92, and stayed under 2.0 from there on

This is the 1m load average by the way (i.e. first number in /proc/loadavg,
given by top, uptime, etc).

Running top while this occurs shows very little CPU usage.  It seems the
standard cause of this is processes in a d state, which means waiting on
I/O.  But we're not seeing this.

In fact, I the system runs sar, and I've collected copious amounts of data.
 But I don't see anything that jumps out that correlates with these events.
 I.e., no surges in disk IO, disk read/write bytes, network traffic, etc.
 The system *never* uses any swap.

I also used dstat to collect all data that it can for 24 hours (so it
captured one of these events).  I used 1 second samples, loaded the info up
into a huge spreadsheet, but again, didn't see any obvious trigger or
interesting stuff going on while the load spiked.

All the programs running on the system seem to work fine while this is
happening... but it triggers all kinds of monitoring alerts which is
annoying.  We've been collecting data too, and as I said above, seems to
happen every 17 hours.

I checked all our cron jobs, and nothing jumped out as an obvious culprit.

Anyone seen anything like this?  Any thoughts or ideas?

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] zoneminder

2014-02-06 Thread Matt Garman
On Thu, Feb 6, 2014 at 9:33 AM,  m.r...@5-cent.us wrote:
 One more thing about zoneminder: after installing it on an FC19 system, I
 don't see anything that I could immediately identify as a driver. *HOW*
 does it get the video? In motion, the very lightweight package, it's using
 V4L2, and the drivers, gspca*, are part of the kernel these days. If
 ZoneMindar is using the same drivers, then I'd expect that it would
 occasionally, after an update, wind up with the same problems motion does.

That's why I suggested IP cameras earlier, as there is no driver.  Or,
figuratively speaking, the IP stack is the driver.  Anything can break
after an update, but basic networking functionality is one of those
things I don't expect to break.

Also, why are you doing updates anyway?  If you had an appliance, as
you wanted, would you be doing updates on that?  Probably not, if it's
working.  So why worry about updates?  Put ZM on a dedicated server or
VM, get it working the way you want, then leave it alone.  Weld the
case shut and disable remote logins and now you literally have an
appliance.

 Btw, I'm now also looking at lower-end video capture cards, like the
 Hauppage Impactvdb, model 188 (four bnc inputs). For that, what I haven't
 found out yet, is whether it provides the cameras one at a time, to be
 switched among, or if all four can stream at the same time, which is what
 we *must* have.

My personal experience with lower-end hardware is that it's the stuff
most likely to break during updates.  It's cheap so the release
process is sloppy and documentation
lagging/poor/inaccurate/non-existent, so you end up with situations
where the drivers are chasing infinite subtle revisions, and/or
reverse engineered, and/or some other kind of kludgery.

If you pay a premium, you can buy stuff that has official Linux
support from the manufacturer.  I was looking at Sensoray products for
my home, but they are out of my price range.  Probably beyond your
budget as well (based on what you've suggested), but it appears that
Linux is an explicit target for their products, not an afterthought or
the dreaded unsupported/use-at-your-own-risk.

But again, IP cameras remove all this complexity
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Motion Detecting Camera

2014-02-03 Thread Matt Garman
On Mon, Feb 3, 2014 at 2:15 PM,  m.r...@5-cent.us wrote:
2) My manager says he wants to be out of the business of this, and
 wants me to look into
  surveillance appliance packages - that is, a DVR w/ say, four
 cameras. They're all in

Does this mean ZoneMinder is out of the question, since it's not an
appliance?  I mean, just for the sake of argument, what happens if
you buy IP cameras and use ZoneMinder?  Isn't that the beauty of an IP
camera, you don't need fancy drivers or have to worry about upgrade
breakage?  (Unless of course your IP stack breaks, but then you
probably have much bigger problems.)  IP cameras allow you to (1)
decouple the camera problem from the DVR problem, and (2) avoid wacky
USB/analog capture driver issues.

I don't know if there's anyone selling OTS ZoneMinder appliances, but
it's conceivably possible.  And if so, it would be like the Untangle
filtering package, where the line between OTS appliance and DIY is
blurred.  (E.g., with Untangle, you can buy a filtering appliance from
them, or you can run their software on your own server.)

I guess I fail to see how the previous poster's suggestion (which is
basically the same as what I initially posted last week) fails to meet
your requirements:

1. Replace cheapo USB cameras with respectable IP cameras.
2. Assign IPs to all cameras.
3. Set up ACLs and/or partition your network to meet security requirements.
4. Designate a single server (physical or VM) to act as your DVR
appliance.  In this case, it's a Linux server running ZoneMinder.
5. Configure ZoneMinder to do full-time/always on recording, and setup
whatever maintenance and management scripts to need to shuffle
around/delete/archive the video.

Once this is in place, I don't see how the end result is any different
than buying a surveillance appliance.  Even an OTS package will
require some amount of initial setup.  But *either way*, once the
system is in place and working, it should just work and not require
any further hand-holding.

Treat the ZoneMinder box as an appliance - that is, if it's working,
don't touch it.  Don't upgrade ZM or the underlying OS.  Just leave it
alone and let it work.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT hardware question

2014-01-31 Thread Matt Garman
On Fri, Jan 31, 2014 at 9:52 AM,  m.r...@5-cent.us wrote:
 With the continuing annoyance from motion, my manager's asked me to go
 looking again for a video surveillance appliance: basically, a
 motion-detecting DVR and cameras. The big thing, of course, is a) price
 (this is a US federal gov't agency, and being civilian, money is *tight*,
 don't give me the libertarian/GOP line about how freely we spend,
 thankyouverymuch), b) it has to be on the network, and c) we need to be
 able to d/l to a server, and rm after we do that... and we want to script
 or cron job that.

 Right now, I'm looking into Zmodo, R-Tech and CIB security. Anyone have a)
 opinions on the quality of the hardware from any of those manufacturers
 (yeah, I know, they're just branded hardware), and/or whether we can do
 the ssh or telnet in to do what we need?

 *Extremely* frustrating, since they're all running embedded Linux, that so
 many say IE and Active X


I don't have specific recommendations for you, but here is some
general info that you might find useful, as I've been looking into
this myself.  Obviously, there exist IP cameras, but, as you've
noticed, you have to be careful that it supports an open standard, and
not IE/ActiveX exclusively.  Another approach is to just get an analog
camera, along with a capture device.  The capture devices come in USB
or PCI(e) flavors (possibly more), and range in price from super cheap
(10 USD) to crazy expensive.  Just from reading about this stuff, it
appears there's a tradeoff, the cheap hardware may require some
wrestling to work reliably, and then may randomly die at some point.
With a little research you can probably find a good balance.  I've
done a little searching on eBay, and it looks like there is no
shortage of capture devices to be had there for cheap as well, if
buying used is an option.

As always, it depends on your application, but with an analog camera,
you move the smarts to your PC or server.  Consider if you have many
cameras, do you want to have that many more servers to manage, or
would you rather have one server with many purpose-built devices
attached?

Take a look at the ZoneMinder software package.  It's the free/open
source way to build a surveillance appliance.  Again, I haven't used
it.  I currently have a Speco D4RS device (came with the house I just
moved into), which is an off-the-shelf surveillance appliance.  The
viewer is IE-only, and the standalone apps are Windows-only (though
they do have Android apps, so some quasi-Linux support)... it's
half-way decent, although I've only just started playing with it in
earnest.  But I'm looking to ZoneMinder as a possible replacement,
partially to get onto an open platform, but also to hopefully
consolidate a standalone device into my existing home server.

As for the cameras themselves, I don't know what model I have, and
wasn't supplied documentation.  My dad's been interested in getting
some camera surveillance going at his house.  But we both get
discouraged when looking for cameras because there seems to be a
million makes and models, but most are probably just re-branded OEM
versions.  The specs always seems to be unclear or inconsistent, and
except for the crazy-expensive ones, they always seem to have lousy
user reviews.  So we always get discouraged trying to wade through the
mess and give up.

-Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT hardware question

2014-01-31 Thread Matt Garman
On Fri, Jan 31, 2014 at 11:00 AM,  m.r...@5-cent.us wrote:
 I think you misunderstood me. I'm not looking for IP cameras - we'll be
 getting cameras that plug into a surveillance DVR appliance. It's the
 -DVR's- firmware software will do the recording and picture taking. What
 we need is to be able to d/l *from* the DVR to a server, where we can
 store it. We don't even need fancy cameras - we're currently using 10 or
 so yr old, USB 1 or 1.1 webcams, and the standard package named motion,
 which works fine... when it works. When the drivers have bugs creep in,
 that's what's pushed my manager to ask me to look this up.

Your original email said, my manager's asked me to go looking again
for a video surveillance appliance: basically, a motion-detecting DVR
and cameras.  I interpreted that as you need a full on DVR suite,
from the camera(s) to the DVR, to the management interface.
Re-reading your original email, I don't think my interpretation was
wrong.

Now it sounds like the camera part of your solution is already in
place: you're using USB webcams, right?  And that part of the solution
will remain unchanged?

That said, I believe my previous email still has some useful
information for you.  As I said, I have a Speco D4RS: this is an
off-the-shelf DVR appliance (that actually advertises the fact that it
runs Linux under-the-hood).  In the general sense, it supports the
features you need: download videos to your server (or cloud or
whatever), and manage the videos on the DVR itself.  But it falls
short for you in that it expects analog camera inputs, and is mostly
Windows-centric.  But given that ZoneMinder is attempting to compete
with these types of DVR appliances, I would be surprised if it didn't
support all the same features as my Speco, but on an open platform,
and with many more input options (USB, IP, DVB/V4L capture card, etc).
 In other words, assuming ZoneMinder supports the features you need,
another option for you is to roll your own DVR appliance with
commodity PC hardware.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT hardware question

2014-01-31 Thread Matt Garman
On Fri, Jan 31, 2014 at 12:45 PM,  m.r...@5-cent.us wrote:
 Mostly likely it *will*. I think we expect to get a surveillance appliance
 - a DVR with firmware, and cameras as part of the package. The ancient USB
 cheapie webcams will go.

I see, so you're looking at a complete package that includes
everything.  I was coming at this from an a la carte perspective,
where you would buy the components individually, and piece your system
together from there.  That's what my original email was based on
anyway.  If you do indeed go a la carte, then the number of options
goes up, e.g. analog vs IP cameras, OTS DVR vs open source, etc etc.
But yes, an all-in-one package makes many of those decisions for you.

 Most of them do. On the other hand, where do you see it using that? I've
 got to Speco's unpleasant website, and managed to find a spec sheet, which
 does *not* mention what the firmware is, while Zmodo and R-Tech and others
 do.

Check out this link[1], Reliable Linux Operating System is listed as
the first Product Highlight.

 I do like the gigabit NIC on it. I don't see it as a package, with four
 videocams, and the cheapest I see it is $322+; newegg wants over $500 for
 it, and that's pushing the envelope, esp. when we need several, and
 there's no cameras in the package. I may just call them to get details - I
 need to do that for several other OEMs.

How many cameras do you need in total?  And what is the budget?

 Btw, also, motion is not, AFAIK, zoneminder.

Right, I don't know anything about Motion, except that I've seen it
mentioned in context with ZoneMinder.  My limited knowledge is that
they are competing Linux/open source DVR packages, although Motion
is the older one with fewer features.

Why not use your existing USB webcams and just give ZoneMinder a try?
Especially if you have an old unused PC or laptop that you can test
with.  Nothing to lose but time.  For the sake of argument, say
ZoneMinder supports all the features you need.  It should run on low
end PC hardware.  If you don't already have some old PC/server
hardware you can re-purpose, you can assemble one for peanuts.  Or,
throw another hard drive on an existing server and run ZoneMinder on
that.  Resource utilization should be negligible (unless you're
capturing multiple high-def streams, which you clearly don't have the
budget for).

My point is, assuming ZoneMinder meets all your DVR
feature/software/management requirements, you can dedicate your entire
budget to the cameras (and possibly capture device if you don't buy IP
cameras).

In other words, why pay for a DVR appliance, when you can have all
that functionality for free with ZoneMinder?  If your budget is that
limited, and you need to spread it out across the surveillance package
(i.e. cameras + DVR), you're going to end up with mediocre hardware.
Cut out the cost of the DVR part and use the budget to get better
cameras.

[1] 
http://www.bhphotovideo.com/c/product/875452-REG/Speco_Technologies_d4rs500_D4RS_4_Channel_DVR_500.html
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT hardware question

2014-01-31 Thread Matt Garman
On Fri, Jan 31, 2014 at 12:50 PM,  m.r...@5-cent.us wrote:
 ...but my manage says he'd like to
 get out of the business of making video surveillance work, when there's
 off-the-shelf stuff out there.

Sounds like a classic problem where you have three requirements...

1. Just works / off-the-shelf, no management required
2. Quality  reliabilty
3. Low cost

...but can only choose two.  :)

Just from reading user reviews of the low cost stuff, it sounds like
you sacrifice quality  reliability for ease-of-use  convenience.
But even those features are suspect, if the reviews are to be
believed.

Just my $0.02  :)
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NIS or not?

2014-01-28 Thread Matt Garman
On Tue, Jan 28, 2014 at 3:02 AM, Sorin Srbu sorin.s...@orgfarm.uu.se wrote:
 The only thing I'm trying to accomplish is a system which will allow me to
 keep user accounts and passwords in one place, with one place only to
 administrate. NIS seems to be able to do that.

 Comments and insights are much appreciated!

A related question: is NIS or LDAP (or something else entirely) better
if the machines are not uniform in their login configuration?

That is, we have an ever-growing list of special cases.  UserA can
login to servers 1, 2 and 3.  UserB can log in to servers 3, 4, and 5.
 Nobody except UserC can login to server 6.  UserD can login to
machines 2--6.  And so on and so forth.

I currently have a custom script with a substantial configuration file
for checking that the actual machines are configured as per our
intent.  It would be nice if there was a single tool where the
configuration and management/auditing could be rolled into one.

Thanks!
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NIS or not?

2014-01-28 Thread Matt Garman
On Tue, Jan 28, 2014 at 9:18 AM,  m.r...@5-cent.us wrote:
 At this late date, I'd be really, *REALLY* leery of using NIS. You say
 that *most* of your traffic is local, suggesting that some of it is *not*.
 And, for that matter, how good are the firewalls keeping other traffic
 out?

 I'd say no to NIS. Yes, other answers may be more difficult to set up, but
 consider the alternatives.

 That is, we have an ever-growing list of special cases.  UserA can
 login to servers 1, 2 and 3.  UserB can log in to servers 3, 4, and 5.
  Nobody except UserC can login to server 6.  UserD can login to
 machines 2--6.  And so on and so forth.

 Here you may not realize you're distinguishing between authentication and
 authorization.

Yeah, I forgot to mention that we already have Kerberos in place for
authentication.  It's authorization that is currently done by hand and
checked with a manual script.  (I needed that for the secure mount
options NFSv4 provides.)

 I sincerely hope it's easier to set up and administer and upgrade than
 native LDAP. In '06, after a discussion with the other admin and manager I
 was working with at that job, I volunteered to set up openLDAP. Let's just
 say that the tools were NOT vaguely ready for prime time, though I did
 find that running webmin helped a *lot* to get it working.

I know you can find a horror story for any piece of software on the
Internet, but my impression is that LDAP has an unusually high number
of scary-sounding anecdotes.  I know random Internet blogs forum posts
aren't really authoritative, but they do give me a little trepidation
regarding LDAP.

 We have an in-house written set of scripts that administer relevant
 configuration files, including /etc/passwd. It copies the correct version
 of that file (among many others) to each host, and shell of /bin/noLogin
 works just fine.

Why set the shell to /bin/noLogin, rather than simply not create that
user's /etc/passwd entry?

I don't have /bin/noLogin on any of my systems - I assume you
deliberately specified a non-existent program for the shell?  What's
the difference between setting the user's shell to a bogus program
versus something like /bin/false?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Do I need a dedicated firewall?

2013-12-12 Thread Matt Garman
On Wed, Dec 11, 2013 at 11:00 PM, Jason T. Slack-Moehrle 
slackmoeh...@gmail.com wrote:

 So my electricity bill is through the roof and I need to pair down some
 equipment.



If you are in the USA, get yourself a Kill-a-Watt power meter.  I'm sure
other parts of the world have similar products.  It's a device that goes
between your electrical product (e.g. server) and the wall AC outlet, and
tells you what the power draw is.  It also keeps a cumulative total for
number of Watts and Volt-Amps used in the time period it's plugged in.  (If
you have a 100% efficient PFC in your power supply, Watts will always equal
Volt-Amps.  I believe this is mandated in Europe.  But a PFC below 1.0 will
cause Volt-Amps to be higher than Watts.  In the USA you are typically
billed by Watts, but if you have a UPS, the Volt-Amp number matters.)

The question is, are you sure it's all your computers causing the spike in
your power bill?  For example, if you have an old refrigerator, those are
typically very inefficient and use more power than necessary.  The
Kill-a-Watt will tell you which devices are most power greedy.



 I have a CentOS 6.5 Server (a few TB, 32gb RAM) running some simple web
 stuff and Zimbra. I have 5 static IP's from Comcast. I am considering
 giving this server a public IP and plugging it directly into my cable
 modem. This box can handle everything with room for me to do more.

 Doing this would allow me to power down my pfSense box and additional
 servers by consolidating onto this single box.



What kind of hardware is your pfSense box?  I too have a pfSense server,
but it's on a fairly low-power Atom board.  Pulls less than 20 watts at any
given time.  The average cost of electricity in the USA is about $0.11/kwh.
 Using that number, a constant X watt draw conveniently works out to
costing $X/year.  So my pfSense box costs less than $20/year in electricity.

Obviously, if your electricity is much more expensive, it changes the
equation.

Just food for thought.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] making a script into a service

2013-12-09 Thread Matt Garman
Turn it into a daemon as described, then take a look at the existing
scripts in /etc/init.d/. There might even be a template in there iirc.
Your script will likely be a simple wrapper around your daemonized python
program.

After that, just do a chkconfig --add myscript“ where myscript is the
name of your script in /etc/init.d.
On Dec 9, 2013 7:05 AM, Larry Martell larry.mart...@gmail.com wrote:

 On Mon, Dec 9, 2013 at 8:00 AM, Fabrizio Di Carlo
 dicarlo.fabri...@gmail.com wrote:
  Try to use this
 
 http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/it
  allows you start/stop/restart the script using the following commands.
 
  python myscript.py start
  python myscript.py stop
  python myscript.py restart
 
 
  Source:
 
 http://stackoverflow.com/questions/16420092/how-to-make-python-script-run-as-service


 Yes, I've seen that question and site. I want to be able to control it
 with the service command. The technique on this site makes the script
 a daemon, but that does not make it controllable with service.


 
  On Mon, Dec 9, 2013 at 1:54 PM, Larry Martell larry.mart...@gmail.com
 wrote:
 
  We have a python script that is currently run from cron. We want to
  make it into a service so it can be controlled with service
  start/stop/restart. Can anyone point me at site that has instructions
  on how to do this? I've googled but haven't found anything.
  ___
  CentOS mailing list
  CentOS@centos.org
  http://lists.centos.org/mailman/listinfo/centos
 
 
 
 
  --
  The intuitive mind is a sacred gift and the rational mind is a faithful
  servant. We have created a society that honors the servant and has
  forgotten the gift. (A. Einstein)
 
  La mente intuitiva è un dono sacro e la mente razionale è un fedele
 servo.
  Noi abbiamo creato una società che onora il servo e ha dimenticato il
  dono.  (A. Einstein)
 
  Fabrizio Di Carlo
  ___
  CentOS mailing list
  CentOS@centos.org
  http://lists.centos.org/mailman/listinfo/cento
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] quota doesn't appear to work - repquota only updates when quotacheck is run

2013-11-21 Thread Matt Garman
I have set up user quotas on an ext4 filesystem.  It does not appear that
the quota system is being updated, except when I manually run quotacheck.

More detail: I run warnquota -s from a script in /etc/cron.daily.  I
noticed that no one had received an over quota message in a long time.
 Using repquota -as, it indeed looked as though everyone was under their
quotas.  But du showed many people were over quota.  So I did a quotaoff
-a ; quotacheck -vam ; quotaon -a.  That night, several
warnquota-generated messages went out.  My users diligently cleaned up
their homes.  Fast forward 24 hours, and the users received the same
warnquota emails.  repquota showed them as being over, but du told a
different story.

System is CentOS 6.3, kernel 2.6.32-279.2.1.el6.x86_64.

# dmesg | grep -i quota
VFS: Disk quotas dquot_6.5.2


The partition is type ext4 mounted at /share:

# cat /proc/mounts | grep share
/dev/mapper/VolGroup_Share-LogVol_Share /share ext4
rw,noatime,nodiratime,barrier=0,nobh,data=writeback,jqfmt=vfsv0,usrjquota=aquota.user
0 0

The ext4 volume sits on top of an lvm logical partion.  That logical volume
ultimately sits on top of an encrypted disk using cryptsetup luksFormat:

# lvscan
  ACTIVE'/dev/VolGroup_Share/LogVol_Share' [4.48 TiB] inherit

# pvscan
  PV /dev/mapper/luks-7f865362-ee9f-40de-bc07-73701b4662f3   VG
VolGroup_Share   lvm2 [4.48 TiB / 0free]

Is there something in my ext4 mount options that is incompatible with
quota?  Or maybe the encrypted layer is causing problems?  Am I missing
something else?

Thanks!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] clock sync/drift

2013-01-22 Thread Matt Garman
Hi,

We have a little over 100 servers, almost all running CentOS 5.7.
Virtually all are Dell servers, generally a mix of 1950s, R610s, and
R410s.

We use NTP and/or PTP to sync their clocks.  One phenomenon we've
noticed is that (1) on reboot, the clocks are all greatly out of sync,
and (2) if the PTP or NTP process is stopped, the clocks start
drifting very quickly.

If it was isolated to one or two servers, I'd dismiss the issue.  I
also had this problem under CentOS 4.

I suspect something is mis-configured, because I can't imagine the
hardware clock on ALL these servers is *that* bad.

Anyone else dealt with anything similar?

Thanks!
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Off-Topic: Low Power Hardware

2013-01-14 Thread Matt Garman
On Fri, Jan 11, 2013 at 8:55 AM, SilverTip257 silvertip...@gmail.com wrote:
 I'm in search of some hardware that consumes a low amount of power for use
 as a test-bed for Linux, various coding projects, and LAN services.

 1) Low power consumption (10-15W ... maybe 30W at most)
 2) Must run Linux without too much fuss (CentOS or otherwise)
 3) Must have two NICs (fast ethernet or better)
 4) Memory - 1GB or better
 5) Can be configurable either via serial or VGA.
 6) Accepts a normal hard drive, not CF -- drive capacity is my concern.
 7) spare PCI slot is a _plus_ (extra NICs or whatever else)
 8) I'd like to keep the physical footprint to a minimum (size of a 1U
 switch or so?)

The lowest-power x86 device I've used is an Alix 2d2 from PCEngines.
Power consumption was about five watts, regardless of load.  This has
three 100 mbps NICs, a 32-bit x86 AMD Geode CPU, and 256 MB RAM
soldered to the board.  Has a built-in Compact Flash slot to use as a
hard drive.  I ran OpenBSD on mine for years as a
firewall/gateway/router for a home LAN (don't see why it wouldn't run
CentOS).  (I'm actually selling mine, email off list if interested.)

I upgraded my firewall device to an Atom-based D2500CCE.  IIRC, I
installed 2x2GB of RAM, booting from a cheap SSD, powered by a
PicoPSU, and running PFSense.  I think this configuration pulls
roughly 16 watts at idle, maybe a couple more watts when fully loaded.
 This board has dual Intel gigabit ethernet ports.

For my home theater PC, I'm running an ASRock H67M-ITX and Core
i3-2100 CPU, with 2x4GB of RAM and SSD.  I have it inside a Habey
EMC-800B case, using the included power supply.  Idle power
consumption is about 22 watts.  It's been a while since I measured
power consumption at load, but I'd guess 50--60 watts (it's idle 99%
of the time though).  Note that even when idle, MythTV seems to use
a little CPU, so if I kill mythfrontend, my idle power consumption
drops another watt or two.

Only one NIC on the Asrock board, but it has a PCIe expansion slot so
you could easily add another.  I'd expect an add-on NIC to add around
one to five watts of power consumption.

My personal workstation uses an Intel DH67GD micro-ATX motherboard,
i5-2500k CPU, 4x4GB RAM, SSD, and traditional ATX power supply
(Seasonic SS-300ET).  It pulls about 30 watts when idle.  Only one NIC
on that motherboard.

For all the above, I'm talking AC (i.e. at the wall) power
consumption, in the USA (so 115 Volts), measured with a Kill-A-Watt
(not high-precision, but should be reasonable within a watt or two).
What follows is stuff with which I have no personal experience, but
have read about:

The Intel S1200KP mini-itx motherboard.  It has built-in dual gigabit
NICs, socket 1155, so you can use anything from a Celeron up to a
Xeon, depending on how much you want to spend and what your
upper-bound computational needs are.  I was considering that for my
firewall/router replacement.  With a PicoPSU I would suspect that one
could get 20 watts or lower idle power consumption.

With an Intel DQ77KB motherboard, and Pentium G2120, SilentPCReview
built a system that pulls 16.5 Watts[1].  (The article is a case
review, but power consumption information is included.)  That DQ77KB
board also has dual gigabit NICs.

You might also be interested in Intel's NUC - Next Unit of
Computing[2].  About 10 watts power consumption for dramatically
under-clocked i3 CPU.

In general, with modern Sandy/Ivy Bridge CPUs, it's almost trivial to
build a high-performing system that has 30 watt or less idle power
consumption.  If you cherry-pick components, it's not terribly hard to
get a system with 20 watt idle power draw.  The modern Intel CPUs all
have roughly the same idle power usage (at least the consumer line,
not sure about Xeons).  That goes for the more expensive low-power
variants as well.  The difference of the low-power variants is their
upper-bound power consumption is lower than their peers.  But you can
often fake that by deliberately limiting the max frequency in the
BIOS.  Of course, with these real CPUs (compared to e.g. Atom),
power consumption will be much higher when loaded.  But from what I've
read, the real CPUs are actually better in the long run, because
their computation efficiency is so much higher.  With something like
Atom, you get more deterministic power draw, but a severely
compromised upper-bound on computational power.  In your requirements,
you mentioned various coding projects.  If you are working in a
compiled language (e.g. C, C++, Java), for substantially large
projects, your compile times will be painful on Atom, but pleasantly
fast on a Sandy/Ivy Bridge CPU.

[1] http://www.silentpcreview.com/Akasa_Euler_Fanless_Thin_ITX_Case

[2] http://www.silentpcreview.com/Intel_NUC_DC3217BY
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] home directory server performance issues

2012-12-12 Thread Matt Garman
On Tue, Dec 11, 2012 at 1:58 AM, Nicolas KOWALSKI
nicolas.kowal...@gmail.com wrote:
 On Mon, Dec 10, 2012 at 11:37:50AM -0600, Matt Garman wrote:
 OS is CentOS 5.6, home directory partition is ext3, with options
 “rw,data=journal,usrquota”.

 Is the data=journal option really wanted here? Did you try with the
 other journalling modes available? I also think you are missing the
 noatime option here.

Short answer: I don't know.  Intuitively, it seems like it's not the
right thing.  However, there are a number of articles out there[1],
that say in data=journal may improve performance dramatically, in
cases where there is a both a lot of reading and writing.  That's what
a home directory server is to me: a lot of reading and writing.
However, I haven't seen any tool or mechanism for precisely
quantifying when data=journal will improve performance; everyone just
says change it and test.  Unfortunately, in my situation, I didn't
have the luxury of testing, because things were unusable now.

[1] for example:
http://www.ibm.com/developerworks/linux/library/l-fs8/index.html
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] home directory server performance issues

2012-12-12 Thread Matt Garman
On Tue, Dec 11, 2012 at 2:24 PM, Dan Young danielmyo...@gmail.com wrote:
 Just going to throw this out there. What is RPCNFSDCOUNT in
 /etc/sysconfig/nfs?

It was 64 (upped from the default of... 8 I think).
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] home directory server performance issues

2012-12-12 Thread Matt Garman
On Tue, Dec 11, 2012 at 4:01 PM, Steve Thompson s...@vgersoft.com wrote:
 This is in fact a very interesting question. The default value of
 RPCNFSDCOUNT (8) is in my opinion way too low for many kinds of NFS
 servers. My own setup has 7 NFS servers ranging from small ones (7 TB disk
 served) to larger ones (25 TB served), and there are about 1000 client
 cores making use of this. After spending some time looking at NFS
 performance problems, I discovered that the number of nfsd's had to be
 much higher to prevent stalls. On the largest servers I now use 256-320
 nfsd's, and 64 nfsd's on the very smallest ones. Along with suitable
 adjustment of vm.dirty_ratio and vm.dirty_background_ratio, this makes a
 huge difference.

Could you perhaps elaborate a bit on your scenario?  In particular,
how much memory and CPU cores do the servers have with the really high
NFSD counts?  Is there a rule of thumb for nfsd counts relative to the
system specs?  Or, like so many IO tuning situations, just a matter of
test and see?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] home directory server performance issues

2012-12-12 Thread Matt Garman
On Wed, Dec 12, 2012 at 12:29 AM, Gordon Messmer yiny...@eburg.com wrote:
 That may be difficult at this point, because you really want to start by
 measuring the number of IOPS.  That's difficult to do if your
 applications demand more than your hardware currently provices.

Since my original posting, we temporarily moved the data from the
centos 5 server to the centos 6 server.  We rebuilt the original
(slow) server with centos 6, then migrated the data back.  So far
(fingers crossed) so good.  I'm running a constant iostat -kx 30,
and logging it to a file.  Disk utilization is virtually always under
50%.  Random spikes in the 90% range, but they are few and far
between.

Now that it appears the hardware + software configuration can handle
the load.  So I still have the same question: how I can accurately
*quantify* the kind of IO load these servers have?  I.e., how to
measure IOPS?

 This might not be the result of your NFS server performance.  You might
 actually be seeing bad performance in your directory service.  What are
 you using for that service?  LDAP?  NIS?  Are you running nscd or sssd
 on the clients?

Not using a directory service (manually sync'ed passwd files, and
kerberos for authentication).  Not running nscd or sssd.

 RAID 6 is good for $/GB, but bad for performance.  If you find that your
 performance is bad, RAID10 will offer you a lot more IOPS.

 Mixing 15k drives with RAID-6 is probably unusual.  Typically 15k drives
 are used when the system needs maximum IOPS, and RAID-6 is used when
 storage capacity is more important than performance.

 It's also unusual to see a RAID-6 array with a hot spare.  You already
 have two disks of parity.  At this point, your available storage
 capacity is only 600GB greater than a RAID-10 configuration, but your
 performance is MUCH worse.

I agree with all that.  Problem is, there is a higher risk of storage
failure with RAID-10 compared to RAID-6.  We do have good, reliable
*data* backups, but no real hardware backup.  Our current service
contract on the hardware is next business day.  That's too much down
time to tolerate with this particular system.

As I typed that, I realized we technically do have a hardware
backup---the other server I mentioned.  But even the time to restore
from backup would make a lot of people extremely unhappy.

How do most people handle this kind of scenario, i.e. can't afford to
have a hardware failure for any significant length of time?  Have a
whole redundant system in place?  I would have to sell the idea to
management, and for that, I'd need to precisely quantify our situation
(i.e. my initial question).

 OS is CentOS 5.6, home
 directory partition is ext3, with options “rw,data=journal,usrquota”.

 data=journal actually offers better performance than the default in some
 workloads, but not all.  You should try the default and see which is
 better.  With a hardware RAID controller that has battery backed write
 cache, data=journal should not perform any better than the default, but
 probably not any worse.

Right, that was mentioned in another response.  Unfortunately, I don't
have the ability to test this.  My only system is the real production
system.  I can't afford the interruption to the users while I fully
unmount and mount the partition (can't change data= type with
remount).

In general, it seems like a lot of IO tuning is change parameter,
then test.  But (1) what test?  It's hard to simulate a very
random/unpredictable workload like user home directories, and (2) what
to test on when one only has the single production system?  I wish
there were more analytic tools where you could simply measure a
number of attributes, and from there, derive the ideal settings and
configuration parameters.

 If your drives are really 4k sectors, rather than the reported 512B,
 then they're not optimal and writes will suffer.  The best policy is to
 start your first partition at 1M offset.  parted should be aligning
 things well if it's updated, but if your partition sizes (in sectors)
 are divisible by 8, you should be in good shape.

It appears that centos 6 does the 1M offset by default.  Centos 5
definitely doesn't do that.

Anyway... as I suggested above, the problem appears to be resolved...
But the fix was kind of a shotgun approach, i.e. I changed too many
things at once to know exactly what specific item fixed the problem.
I'm sure this will inevitably come up again at some point, so I'd
still like to learn/understand more to better handle the situation
next time.

Thanks!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] home directory server performance issues

2012-12-10 Thread Matt Garman
I’m looking for advice and considerations on how to optimally setup
and deploy an NFS-based home directory server.  In particular: (1) how
to determine hardware requirements, and (2) how to best setup and
configure the server.  We actually have a system in place, but the
performance is pretty bad---the users often experience a fair amount
of lag (1--5 seconds) when doing anything on their home directories,
including an “ls” or writing a small text file.

So now I’m trying to back-up and determine, is it simply a
configuration issue, or is the hardware inadequate?

Our scenario: we have about 25 users, mostly software developers and
analysts.  The users login to one or more of about 40 development
servers.  All users’ home directories live on a single server (no
login except root); that server does an NFSv4 export which is mounted
by all dev servers.  The home directory server hardware is a Dell R510
with dual E5620 CPUs and 8 GB RAM.  There are eight 15k 2.5” 600 GB
drives (Seagate ST3600057SS) configured in hardware RAID-6 with a
single hot spare.  RAID controller is a Dell PERC H700 w/512MB cache
(Linux sees this as a LSI MegaSAS 9260).  OS is CentOS 5.6, home
directory partition is ext3, with options “rw,data=journal,usrquota”.

I have the HW RAID configured to present two virtual disks to the OS:
/dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for
the home directories.  I’m fairly certain I did not align the
partitions optimally:

[root@lnxutil1 ~]# parted -s /dev/sda unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sda: 134217599s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  StartEnd SizeType File system  Flags
 1  63s  465884s 465822s primary  ext2 boot
 2  465885s  134207009s  133741125s  primary   lvm

[root@lnxutil1 ~]# parted -s /dev/sdb unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sdb: 5720768639s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End  Size File system  Name  Flags
 1  34s5720768606s  5720768573s lvm


Can anyone confirm that the partitions are not aligned correctly, as I
suspect?  If this is true, is there any way to *quantify* the effects
of partition mis-alignment on performance?  In other words, what kind
of improvement could I expect if I rebuilt this server with the
partitions aligned optimally?

In general, what is the best way to determine the source of our
performance issues?  Right now, I’m running “iostat -dkxt 30”
re-directed to a file.  I intend to let this run for a day or so, and
write a script to produce some statistics.

Here is one iteration from the iostat process:

Time: 09:37:28 AM
Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.0044.09  0.03 107.76 0.13   607.40
11.27 0.898.27   7.27  78.35
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.0044.09  0.03 107.76 0.13   607.40
11.27 0.898.27   7.27  78.35
sdb   0.00  2616.53  0.67 157.88 2.80 11098.83
140.04 8.57   54.08   4.21  66.68
sdb1  0.00  2616.53  0.67 157.88 2.80 11098.83
140.04 8.57   54.08   4.21  66.68
dm-0  0.00 0.00  0.03 151.82 0.13   607.26
8.00 1.258.23   5.16  78.35
dm-1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-2  0.00 0.00  0.67 2774.84 2.80 11099.37
8.00   474.30  170.89   0.24  66.84
dm-3  0.00 0.00  0.67 2774.84 2.80 11099.37
8.00   474.30  170.89   0.24  66.84


What I observe, is that whenever sdb (home directory partition)
becomes loaded, sda (OS) often does as well.  Why is this?  I would
expect sda to generally be idle, or have minimal utilization.
According to both “free” and “vmstat”, this server is not swapping at
all.

At one point, our problems were due to a random user writing a huge
file to their home directory.  We built a second server specifically
for people to use for writing large temporary files.  Furthermore, for
all the dev servers, I used the following tc commands to rate limit
how quickly any one server can write to the home directory server (8
Mbps or 1 MB/s):

ETH_IFACE=$( route -n | grep ^0.0.0.0 | awk '{ print $8 }' )
IFACE_RATE=1000mbit
LIMIT_RATE=8mbit
TARGET_IP=1.2.3.4 # home directory server IP
tc qdisc add dev $ETH_IFACE root handle 1: htb default 1
tc class add dev $ETH_IFACE parent 1: classid 1:1 htb rate $IFACE_RATE
ceil $IFACE_RATE
tc class add dev $ETH_IFACE parent 1: classid 1:2 htb rate $LIMIT_RATE
ceil $LIMIT_RATE
tc filter add dev $ETH_IFACE parent 1: protocol ip prio 16 u32 match
ip dst $TARGET_IP flowid 1:2

The other interesting thing is that the second server I mentioned—the
one specifically designed for users to 

Re: [CentOS] Static routes with a metric?

2011-12-15 Thread Matt Garman
Adding additional info for posterity, and in case anyone else runs
across this...

On Wed, Dec 7, 2011 at 12:28 PM, Benjamin Franz jfr...@freerun.com wrote:
 On 12/7/2011 10:03 AM, Matt Garman wrote:
 Hi,

 [...]

 What I basically need to be able to do is this:
 route add -host h1 gw g1 metric 0
 route add -host h1 gw g2 metric 10

 Notice that everything is the same except the gateway and metric. I could
 put this in /etc/rc.local, but was wondering if there's a cleaner way to do
 it in e.g. the network-scripts directory.


 If you create files in the /etc/sysconfig/network-scripts directory
 named according to the scheme

 route-eth0
 route-eth1
 route-eth2

 it will execute each line in the files as

 /sbin/ip route add line

 when each interface is brought up.

 Look in the /etc/sysconfig/network-scripts/ifup-routes script for all
 the gory details and features.

I actually did just that---looked at the ifup-routes script.  The
thing that threw me off is the comments about older format versus
new format.  I probably read into the comments too much, but I
thought to myself, I should probably use the new format, as they
might some day deprecate the old format.

But anyway, the older format is what I need.  With the older format,
it's exactly what you said above: each line corresponds to running ip
route add line.  So what I added were lines in this format:

addr/mask via gateway dev device metric N

A contrived example might be:

10.25.77.0/24 via 192.168.1.1 dev eth0 metric 5

The new format is where each group of three lines corresponds to a
route.  You have the ADDRESSxx=, NETMASKxx=, GATEWAYxx= lines.
Clearly this is less flexible, particularly if you need to set a
metric like me.  :)

Anyway, hopefully that's useful for anyone in a similar situation!

-Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Static routes with a metric?

2011-12-07 Thread Matt Garman
Hi,

How can I define static routes to be created at boot time with a specific
metric? I have two NICs that ultimately end up at the same peer, but
literally go through two completely different networks. IOW, each NIC
connects to a different layer 3 device.

Also, note that the machine actually has three total NICs: the third is the
owner of the default route. The two mentioned above are for a specialized
sub net.

What I basically need to be able to do is this:
route add -host h1 gw g1 metric 0
route add -host h1 gw g2 metric 10

Notice that everything is the same except the gateway and metric. I could
put this in /etc/rc.local, but was wondering if there's a cleaner way to do
it in e.g. the network-scripts directory.

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] ibm m1015 w/sandy bridge boot failure

2011-10-31 Thread Matt Garman
On Thu, Oct 27, 2011 at 09:18:56AM -0500, Matt Garman wrote:
 I have a server running CentOS 6.0.  Last night I replaced the CPU and
 motherboard.  Old hardware: Supermicro x8sil-f + x3440.  New hardware:
 Supermicro x9scl+-f + E3-1230.  This is a new Sandy Bridge Xeon.
 
 Everything else remained the same, including an IBM m1015 SAS HBA.
 This is just an IBM re-branded LSI 92xx-8i (9220-8i specifically I
 believe), which uses the LSI SAS2008 chipset and the megaraid_sas
 driver.  It worked without any issue on the old motherboard/cpu.
 However, with the new motherboard/cpu, the system makes it through the
 BIOS POST without any issue.  But about half-way through the kernel
 initialization, it basically locks up.  It will sit there for several
 minutes, doing nothing, then start printing out error messages (I was
 unable to get a screenshot or take note of the errors).
 
 But if I take the m1015 card out, the system boots quickly and without
 issue, as it always has.
 
 I saw this post[1] on the forums, which suggested that Sandy Bridge
 really needs 6.1, which, for those of us using CentOS, we can only get
 sorta close to by using the continuous release.  So I did a yum
 install centos-release-cr ; yum update.  I let everything install
 (there were no install errors or problems), and rebooted with the
 m1015 in back in the system, but the problem remains.
 
 So now I'm at a loss.  Anyone have any thoughts?
 
 [1] http://www.centos.org/modules/newbb/viewtopic.php?topic_id=33878forum=56


SuperMicro actually has a FAQ[1] on a very similar issue: same
motherboard, but an LSI 9240 RAID card (very similar to the IBM
M1015, same actual chipset I believe), and CentOS 5.5.  But the
described problem is the same as mine.

Simple fix: upgrade BIOS to v1.1a or later.  Works for me!

[1] http://www.supermicro.com/support/faqs/faq.cfm?faq=12830

Hope this helps anyone with the same problem.
-Matt

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] ibm m1015 w/sandy bridge boot failure

2011-10-27 Thread Matt Garman
I have a server running CentOS 6.0.  Last night I replaced the CPU and
motherboard.  Old hardware: Supermicro x8sil-f + x3440.  New hardware:
Supermicro x9scl+-f + E3-1230.  This is a new Sandy Bridge Xeon.

Everything else remained the same, including an IBM m1015 SAS HBA.
This is just an IBM re-branded LSI 92xx-8i (9220-8i specifically I
believe), which uses the LSI SAS2008 chipset and the megaraid_sas
driver.  It worked without any issue on the old motherboard/cpu.
However, with the new motherboard/cpu, the system makes it through the
BIOS POST without any issue.  But about half-way through the kernel
initialization, it basically locks up.  It will sit there for several
minutes, doing nothing, then start printing out error messages (I was
unable to get a screenshot or take note of the errors).

But if I take the m1015 card out, the system boots quickly and without
issue, as it always has.

I saw this post[1] on the forums, which suggested that Sandy Bridge
really needs 6.1, which, for those of us using CentOS, we can only get
sorta close to by using the continuous release.  So I did a yum
install centos-release-cr ; yum update.  I let everything install
(there were no install errors or problems), and rebooted with the
m1015 in back in the system, but the problem remains.

So now I'm at a loss.  Anyone have any thoughts?

[1] http://www.centos.org/modules/newbb/viewtopic.php?topic_id=33878forum=56

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] how to stop an in-progress fsck that runs at boot?

2011-09-13 Thread Matt Garman
I can't seem to find the answer to this question via web search... I
changed some hardware on a server, and upon powering it back on, got
the /dev/xxx has gone 40 days without being check, check forced
message.  Now it's running fsck on a huge (2 TB) ext3 filesystem (5400
RPM drives no less).  How can I stop this in-progress check?  Ctrl-C
doesn't seem to have any effect.  Is the only answer to wait it out?

Also, as a side question: I always do this---let my servers run for a
very long time, power down to change/upgrade hardware, then forget
about the forced fsck, then pull my hair out waiting for it to finish
(because I can't figure out how to stop it once it starts).  I know
about tune2fs -c and -i, and also the last (or is it second to last?)
column in /etc/fstab.  My question is more along the lines of best
practices---what are most people doing with regards to regular fsck's
of ext2/3/4 filesystems?  Do you just take the defaults, and let it
delay the boot process by however long it takes?  Disable it
completely?  Or do something like taking the filesystem offline on a
running system?  Something else?

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NetApp DataFabric Manager/Sybase/SQLAnywhere on CentOS?

2011-07-21 Thread Matt Garman
On Wed, Jul 20, 2011 at 12:39 PM, James Hogarth james.hoga...@gmail.com wrote:
 I am trying to install DFM 4.0.2, and have tried on both CentOS 4.8
 i386 and CentOS 5.5 x86_64.  I have edited my /etc/redhat-release file
 to be equal to RHEL's, as the DFM installer immediately aborts if that
 isn't right.  However, I still have errors during the install:


 ...

 Communication error
 error: %post(NTAPdfm-3.8-6640.i386) scriptlet failed, exit status 1



 ...

 Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Running Linux
 2.6.9-89.0.25.ELsmp #1 SMP Thu May 6 12:28:03 EDT 2010 on X86 (X86_64)
 Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Server built for X86
 processor architecture Jul 19 12:58:11 lnxsvr41
 SQLAnywhere(monitordb): Asynchronous IO disabled due to lack of proper
 OS support


 You said you tried 4.8 32bit and 5.5 64bit. how about 5.6 32bit?

 The era would be about right on C5 and it's obviously looking for
 32bit libraries and architecture

 What do the docs and NetApp say about RHEL support?

Turns out the problem was with our /etc/security/limits.conf setting.
We (by design) limit virtual memory to 2 GB.  Apparently the installer
needs more than that.  So we did a umask -v unlimited and the
install worked!

Hope this is helpful for anyone else with the same problem.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] NetApp DataFabric Manager/Sybase/SQLAnywhere on CentOS?

2011-07-20 Thread Matt Garman
Hello,

Has anyone out there successfully installed NetApp's DataFabric
Manager (DFM) on CentOS?  If so, what version of DFM, CentOS, and
what architecture?

I am trying to install DFM 4.0.2, and have tried on both CentOS 4.8
i386 and CentOS 5.5 x86_64.  I have edited my /etc/redhat-release file
to be equal to RHEL's, as the DFM installer immediately aborts if that
isn't right.  However, I still have errors during the install:

Enter your NetApp DataFabric Manager license key [?,q]: entered my key
Beginning the installation ...
Preparing...###
[100%]
   1:NTAPdfm###
[100%]
Installing scripts in /etc/init.d directory.
Configuring DataFabric Manager server services.
Setting up sql ...
Starting SQL ...
Communication error
error: %post(NTAPdfm-3.8-6640.i386) scriptlet failed, exit status 1


/var/log/messages has this:

[ ... ]
Jul 19 12:58:10 lnxsvr41 SQLAnywhere(monitordb): 16 logical
processor(s) on 2 physical processor(s) detected.
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Per-processor
licensing model. The server is licensed to use 2 physical
processor(s).
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): This server is licensed to:
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): DFM User
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): NetApp
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Running Linux
2.6.9-89.0.25.ELsmp #1 SMP Thu May 6 12:28:03 EDT 2010 on X86 (X86_64)
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Server built for X86
processor architecture Jul 19 12:58:11 lnxsvr41
SQLAnywhere(monitordb): Asynchronous IO disabled due to lack of proper
OS support
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Maximum cache size
adjusted to 1670312K Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb):
Authenticated Server licensed for use with Authenticated Applications
only
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): 8192K of memory used
for caching
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Minimum cache size:
8192K, maximum cache size: 1670312K
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Using a maximum page
size of 8192 bytes
Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): TCP/IP functions not
found Jul 19 12:58:11 lnxsvr41
SQLAnywhere(monitordb): Database server shutdown due to startup error


Looks like the problem is with SQLAnywhere, which appears to be a
Sybase product.

Anyone been down this road?  So far NetApp is stonewalling me on CentOS support.

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] scheduling differences between CentOS 4 and CentOS 5?

2011-05-30 Thread Matt Garman
On Tue, May 24, 2011 at 02:22:12PM -0400, R P Herrold wrote:
 On Mon, 23 May 2011, Mag Gam wrote:
 
  I would like to confirm Matt's claim. I too experienced larger
  latencies with Centos 5.x compared to 4.x. My application is very
  network sensitive and its easy to prove using lat_tcp.
 
  Russ,
  I am curious about identifying the problem. What tools do you
  recommend to find where the latency is coming from in the application?
 
 I went through the obvious candidates:
   system calls
   (loss of control of when if ever the
   scheduler decides to let your process run again)

This is almost certainly what it is for us.  But in this situation,
these calls are limited to mutex operations and condition variable
signaling.

   polling v select
   polling is almost always a wrong approach when
   latency reduction is in play
   (reading and understanding: man 2 select_tut
is time very well spent)

We are using select().  However, that is only for the networking
part (basically using select() to wait on data from a socket).
Here, my concern isn't with network latency---it's with intra
process latency.

   choice of implementation language -- the issue here
   being if one uses a scripting language, one cannot
   'see' the time leaks

C/C++ here.

 Doing metrics permits both 'hot spot' analysis, and moves the 
 coding from 'guesstimation' to software engineering.  We use 
 graphviz, and gnuplot on the plain text 'CSV-style' timings 
 files to 'see' outliers and hotspots

We're basically doing that.  We pre-allocate a huge 2D array for
keeping stopwatch points throughout the program.  Each column
represents a different stopwatch point, and each row represents and
different iteration through these measured points.  After a lot of
iterations (usually at least 100k), the numbers are dumped to a file
for analysis.

Basically, the standard deviation from one iteration to the next is
fairly low.  It's not like there are a few outliers driving the
average intra-process latency up; it's just that, in general, going
from point A to point B takes longer with the newer kernels.

For what it's worth, I tried a 2.6.39 mainline kernel (from elrepo),
and the intra-process latencies get still worse.  It appears that
whatever changes are being made to the kernel, it's bad for our kind
of program.  I'm trying to figure out, from a conceptual level, what
those changes are.  I'm looking for an easier way to understand than
reading the kernel source and change history.  :)


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] scheduling differences between CentOS 4 and CentOS 5?

2011-05-20 Thread Matt Garman
We have several latency-sensitive pipeline-style programs that have
a measurable performance degredation when run on CentOS 5.x versus
CentOS 4.x.

By pipeline program, I mean one that has multiple threads.  The
mutiple threads work on shared data.  Between each thread, there is a
queue.  So thread A gets data, pushes into Qab, thread B pulls from
Qab, does some processing, then pushes into Qbc, thread C pulls from
Qbc, etc.  The initial data is from the network (generated by a 3rd
party).

We basically measure the time from when the data is received to when
the last thread performs its task.  In our application, we see an
increase of anywhere from 20 to 50 microseconds when moving from
CentOS 4 to CentOS 5.

I have used a few methods of profiling our application, and determined
that the added latency on CentOS 5 comes from queue operations (in
particular, popping).

However, I can improve performance on CentOS 5 (to be the same as
CentOS 4) by using taskset to bind the program to a subset of the
available cores.

So it appers to me, between CentOS 4 and 5, there was some change
(presumably to the kernel) that caused threads to be scheduled
differently (and this difference is suboptimal for our application).

While I can solve this problem with taskset, my preference is to not
have to do this.  I'm hoping there's some kind of kernel tunable (or
maybe collection of tunables) whose default was changed between
versions.

Anyone have any experience with this?  Perhaps some more areas to investigate?

Thanks,
Matt
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos