Re: [CentOS] kerberized-nfs - any experts out there?
On Wed, Mar 22, 2017 at 6:11 PM, John Jasen <jja...@realityfailure.org> wrote: > On 03/22/2017 03:26 PM, Matt Garman wrote: >> Is anyone on the list using kerberized-nfs on any kind of scale? > > Not for a good many years. > > Are you using v3 or v4 NFS? v4. I think you can only do kerberized NFS with v4. > Also, you can probably stuff the rpc.gss* and idmapd services into > verbose mode, which may give you a better ideas as to whats going on. I do that. The logs are verbose, but generally too cryptic for me to make sense of. Web searches on the errors yield results at best 50% of the time, and the hits almost never have a solution. > And yes, the kernel does some kerberos caching. I think 10 to 15 minutes. To me it looks like it's more on the order of an hour. For example, a simple test I've done is to do a "fresh" login on a server. The server has just been rebooted, and with the reboot, all the /tmp/krb5cc* files were deleted. I login via ssh, which implicitly establishes my Kerberos tickets. I deliberately do a "kdestroy". Then I have a simple shell loop like this: while [ 1 ] ; do date ; ls ; sleep 30s ; done Which is just doing an ls on my home directory, which is a kerberized NFS mount. Despite having done a kdestroy, this works, presumably from cached credentials. And it continues to work for *about* an hour, and then I start getting permission denied. I emphasized "about" because it's not precisely one hour, but seems to range from maybe 55 to 65 minutes. But, that's a super-simple, controlled test. What happens when you add screen multiplexers (tmux, gnu screen) into the mix. What if you login "fresh" via password versus having your gss (kerberos) credentials forwarded? What if you're logged in multiple times on the same machine by via different methods? ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] kerberized-nfs - any experts out there?
On Wed, Mar 22, 2017 at 3:19 PM, <m.r...@5-cent.us> wrote: > Matt Garman wrote: >> (2) Permission denied issues. I have user Kerberos tickets >> configured for 70 days. But there is clearly some kind of >> undocumented kernel caching going on. Looking at the Kerberos server >> logs, it looks like it "could" be a performance issue, as I see 100s >> of ticket requests within the same second when someone tries to launch >> a lot of jobs. Many of these will fail with "permission denied" but >> if they immediately re-try, it works. Related to this, I have been >> unable to figure out what creates and deletes the >> /tmp/krb5cc_uid_random files. > > Are they asking for *new* credentials each time? They should only be doing > one kinit. Well, that's what I don't understand. In practice, I don't believe a user should ever have to explicitly do kinit, as their credentials/tickets are implicitly created (and forwarded) via ssh. Despite that, I see the /tmp/krb5cc_uid files accumulating over time. But I've tried testing this, and I haven't been able to determine exactly what creates those files. And I don't understand why new krb5cc_uid files are created when there is an existing, valid file already. Clearly some programs ignore existing files, and some create new ones. > And there's nothing in the logs, correct? Have you tried attaching strace > to one of those, and see if you can get a clue as to what's happening? Actually, I get this in the log: Mar 22 13:25:09 daemon.err lnxdev108 rpc.gssd[19329]: WARNING: handle_gssd_upcall: failed to find uid in upcall string 'mech=krb5' Thanks, Matt ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] kerberized-nfs - any experts out there?
Is anyone on the list using kerberized-nfs on any kind of scale? I've been fighting with this for years. In general, when we have issues with this system, they are random and/or not repeatable. I've had very little luck with community support. I hope I don't offend by saying that! Rather, my belief is that these problems are very niche/esoteric, and so beyond the scope of typical community support. But I'd be delighted to be proven wrong! So this is more of a "meta" question: anyone out there have any general recommendations for how to get support on what I presume are niche problems specific to our environment? How is paid upstream support? Just to give a little insight into our issues: we have an in-house-developed compute job dispatching system. Say a user has 100s of analysis jobs he wants to run, he submits them to a central master process, which in turn dispatches them to a "farm" of >100 compute nodes. All these nodes have two different krb5p NFS mounts, to which the jobs will read and write. So while the users can technically log in directly to the compute nodes, in practice they never do. The logins are only "implicit" when the job dispatching system does a behind-the-scenes ssh to kick off these processes. Just to give some "flavor" to the kinds of issues we're facing, what tends to crop up are one of three things: (1) Random crashes. These are full-on kernel trace dumps followed by an automatic reboot. This was really bad under CentOS 5. A random kernel upgrade magically fixed it. It happens almost never under CentOS 6. But happens fairly frequently under CentOS 7. (We're completely off CentOS 5 now, BTW.) (2) Permission denied issues. I have user Kerberos tickets configured for 70 days. But there is clearly some kind of undocumented kernel caching going on. Looking at the Kerberos server logs, it looks like it "could" be a performance issue, as I see 100s of ticket requests within the same second when someone tries to launch a lot of jobs. Many of these will fail with "permission denied" but if they immediately re-try, it works. Related to this, I have been unable to figure out what creates and deletes the /tmp/krb5cc_uid_random files. (3) Kerberized NFS shares getting "stuck" for one or more users. We have another monitoring app (in-house developed) that, among other things, makes periodic checks of these NFS mounts. It does so by forking and doing a simple "ls" command. This is to ensure that these mounts are alive and well. Sometimes, the "ls" command gets stuck to the point where it can't even be killed via "kill -9". Only a reboot fixes it. But the mount is only stuck for the user running the monitoring app. Or sometimes the monitoring app is fine, but an actual user's processes will get stuck in "D" state (in top, means waiting on IO), but everyone else's jobs (and access to the kerberizes nfs shares) are OK. This is actually blocking us from upgrading to CentOS 7. But my colleagues and I are at a loss how to solve this. So this post is really more of a semi-desperate plea for any kind of advice. What other resources might we consider? Paid support is not out of the question (within reason). Are there any "super specialist" consultants out there who deal in Kerberized NFS? Thanks! Matt ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Spotty internet connection
On Fri, Feb 3, 2017 at 12:08 PM, John R Piercewrote: > for Comcast/Xfinity, I'm using a Arris SB6183 that I got at Costco. this > is a simple modem/bridge, so /my/ router behind it gets the public IP. Note that some residential ISPs may not offer "naked" Internet, and/or won't allow you to bring your own device (BYOD). At least in my area, there are only two options for residential Internet; cable-based via Comcast, and DSL-based via AT I used to routinely switch back and forth between the two, to play them against each other for the best rates. However, I had to give up on AT because they stopped offering a "naked" service. That is, when I was using them, I had the most basic DSL modem, that literally did nothing except provide me with a public Internet IP and the service. Last I talked to them, I could only use their service with their fancy all-in-one devices, that are both a DSL modem and gateway/router/wireless AP. I already have all that infrastructure in my house, and I trust my ability to manage it more than I trust the blackbox firmware that AT provides. Going from memory, that all-in-one DSL service did give me a public IP, but the device itself implemented NATing, so it looked like I was getting a private IP. There *may* have been a way to remove most of the functionality of the all-in-one device ("DMZ mode" or something like that); it's been discussed pretty heavily on the DSLReports Forums. (But, either way, even ignoring the technical grievances with their service, AT's prices are higher and speed tiers lower than Comcast's.) TL;DR: (1) some ISPs may not allow BYOD; (2) if it looks like your ISP is giving you a private IP, dig a little deeper, it could simply appear that way due to the way the ISP configures the assigned device. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Spotty internet connection
On Thu, Feb 2, 2017 at 7:13 PM, TE Dukeswrote: > Lately I have been getting slow and partial page loads, server not found, > server timed out, etc.. Get knocked off ssh when accessing my home server > from work, etc. Its not the work connection because I don't have problems > accessing other sites, just here at home and my home server. > > Is there any kind of utility to check for failing hardware? I have the exact same problems from time to time via Comcast. Mine comes and goes, and lately it hasn't been too bad. But when it comes, it's down for very small amounts of time, maybe 30-90 seconds, which is just long enough to be annoying, and make the service unusable. When it was really bad (intermittent dropouts as described above, almost every night during prime time, usually for several hours at a time) I wrote a program to do constant pings to several servers at once. If you're interested, I'll see if I can find that script. But, conceptually, it ran concurrent pings to several sites, and kept some stats on drops longer than some threshold. Some tips on a program like this: use IP addresses, rather than hostnames, because ultimately using a hostname implicitly does a DNS lookup, which likely requires Internet service to work. I also did several servers at once, so I could prove it wasn't just the one site I was pinging. Included in the list of servers was also the nexthop device beyond my house (presumably Comcast's own router). Use traceroute to figure out network paths. After running this for a while---before I called them with the evidence---the problem magically cleared up, and since then it's been infrequent enough that I haven't felt the need to fire up the script again. When it comes to residential Internet, I am quite cynical towards monopoly ISPs like Comcast... so maybe they saw the constant pings and knew I was building a solid case and fixed the problem. Or maybe enough people in my area complained of similar problems and they actually felt uncharacteristically caring for a second. I haven't been there in a while, but in the past, I've gotten a lot of utility out of the DSLReports Forums[1]. There are private forums that will put you in direct contact with technical people at your ISP. It can sometimes be a good way to side-step the general customer service hotline and get in touch with an actual engineer rather than a script reader. Maybe not, but worst-case you're only out some time. Also, you might post this same question to one of the public forums over there, as there seems to be lots of knowledgeable/helpful people hanging out there. (Despite the name, it's not only about DSL, but consumer ISPs in general.) [1] http://www.dslreports.com/forums/all Good luck, let us know if you come up with any decent resolution! ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Thu, Oct 27, 2016 at 12:03 AM, Larry Martellwrote: > This site is locked down like no other I have ever seen. You cannot > bring anything into the site - no computers, no media, no phone. You > ... > This is my client's client, and even if I could circumvent their > policy I would not do that. They have a zero tolerance policy and if > ... OK, no internet for real. :) Sorry I kept pushing this. I made an unflattering assumption that maybe it just hadn't occurred to you how to get files in or out. Sometimes there are "soft" barriers to bringing files in or out: they don't want it to be trivial, but want it to be doable if necessary. But then there are times when they really mean it. I thought maybe the former applied to you, but clearly it's the latter. Apologies. > These are all good debugging techniques, and I have tried some of > them, but I think the issue is load related. There are 50 external > machines ftp-ing to the C7 server, 24/7, thousands of files a day. And > on the C6 client the script that processes them is running > continuously. It will sometimes run for 7 hours then hang, but it has > run for as long as 3 days before hanging. I have never been able to > reproduce the errors/hanging situation manually. If it truly is load related, I'd think you'd see something askew in the sar logs. But if the load tends to spike, rather than be continuous, the sar sampling rate may be too coarse to pick it up. > And again, this is only at this site. We have the same software > deployed at 10 different sites all doing the same thing, and it all > works fine at all of those. Flaky hardware can also cause weird intermittent issues. I know you mentioned before your hardware is fairly new/decent spec; but that doesn't make it immune to manufacturing defects. For example, imagine one voltage regulator that's ever-so-slightly out of spec. It happens. Bad memory is not uncommon and certainly causes all kinds of mysterious issues (though in my experience that tends to result in spontaneous reboots or hard lockups, but truly anything could happen). Ideally, you could take the system offline and run hardware diagnostics, but I suspect that's impossible given your restrictions on taking things in/out of the datacenter. On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell wrote: > Well I spoke too soon. The importer (the one that was initially > hanging that I came here to fix) hung up after running 20 hours. There > were no NFS errors or messages on neither the client nor the server. > When I restarted it, it hung after 1 minute, Restarted it again and it > hung after 20 seconds. After that when I restarted it it hung > immediately. Still no NFS errors or messages. I tried running the > process on the server and it worked fine. So I have to believe this is > related to nobarrier. Tomorrow I will try removing that setting, but I > am no closer to solving this and I have to leave Japan Saturday :-( > > The bad disk still has not been replaced - that is supposed to happen > tomorrow, but I won't have enough time after that to draw any > conclusions. I've seen behavior like that with disks that are on their way out... basically the system wants to read a block of data, and the disk doesn't read it successfully, so it keeps trying. The kind of disk, what kind of controller it's behind, raid level, and various other settings can all impact this phenomenon, and also how much detail you can see about it. You already know you have one bad disk, so that's kind of an open wound that may or may not be contributing to your bigger, unsolved problem. So that makes me think, you can also do some basic disk benchmarking. iozone and bonnie++ are nice, but I'm guessing they're not installed and you don't have a means to install them. But you can use "dd" to do some basic benchmarking, and that's all but guaranteed to be installed. Similar to network benchmarking, you can do something like: time dd if=/dev/zero of=/tmp/testfile.dat bs=1G count=256 That will generate a 256 GB file. Adjust "bs" and "count" to whatever makes sense. General rule of thumb is you want the target file to be at least 2x the amount of RAM in the system to avoid cache effects from skewing your results. Bigger is even better if you have the space, as it increases the odds of hitting the "bad" part of the disk (if indeed that's the source of your problem). Do that on C6, C7, and if you can a similar machine as a "control" box, it would be ideal. Again, we're looking for outliers, hang-ups, timeouts, etc. +1 to Gordon's suggestion to sanity check MTU sizes. Another random possibility... By somewhat funny coincidence, we have some servers in Japan as well, and were recently banging our heads against the wall with some weird networking issues. The remote hands we had helping us (none of our staff was on site) claimed one or more fiber cables were dusty, enough that it was affecting
Re: [CentOS] NFS help
On Tue, Oct 25, 2016 at 7:22 PM, Larry Martellwrote: > Again, no machine on the internal network that my 2 CentOS hosts are > on are connected to the internet. I have no way to download anything., > There is an onerous and protracted process to get files into the > internal network and I will see if I can get netperf in. Right, but do you have physical access to those machines? Do you have physical access to the machine which on which you use PuTTY to connect to those machines? If yes to either question, then you can use another system (that does have Internet access) to download the files you want, put them on a USB drive (or burn to a CD, etc), and bring the USB/CD to the C6/C7/PuTTY machines. There's almost always a technical way to get files on to (or out of) a system. :) Now, your company might have *policies* that forbid skirting around the technical measures that are in place. Here's another way you might be able to test network connectivity between C6 and C7 without installing new tools: see if both machines have "nc" (netcat) installed. I've seen this tool referred to as "the swiss army knife of network testing tools", and that is indeed an apt description. So if you have that installed, you can hit up the web for various examples of its use. It's designed to be easily scripted, so you can write your own tests, and in theory implement something similar to netperf. OK, I just thought of another "poor man's" way to at least do some sanity testing between C6 and C7: scp. First generate a huge file. General rule of thumb is at least 2x the amount of RAM in the C7 host. You could create a tarball of /usr, for example (e.g. "tar czvf /tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough to hold this). Then, first do this: "time scp /tmp/bigfile.tar.gz localhost:/tmp/bigfile_copy.tar.gz". This will literally make a copy of that big file, but will route through most of of the network stack. Make a note of how long it took. And also be sure your /tmp partition is big enough for two copies of that big file. Now, repeat that, but instead of copying to localhost, copy to the C6 box. Something like: "time scp /tmp/bigfile.tar.gz :/tmp/". Does the time reported differ greatly from when you copied to localhost? I would expect them to be reasonably close. (And this is another reason why you want a fairly large file, so the transfer time is dominated by actual file transfer, rather than the overhead.) Lastly, do the reverse test: log in to the C6 box, and copy the file back to C7, e.g. "time scp /tmp/bigfile.tar.gz :/tmp/bigfile_copy2.tar.gz". Again, the time should be approximately the same for all three transfers. If either or both of the latter two copies take dramatically longer than the first, then there's a good chance something is askew with the network config between C6 and C7. Oh... all this time I've been jumping to fancy tests. Have you tried the simplest form of testing, that is, doing by hand what your scripts do automatically? In other words, simply try copying files between C6 and C7 using the existing NFS config? Can you manually trigger the errors/timeouts you initially posted? Is it when copying lots of small files? Or when you copy a single huge file? Any kind of file copying "profile" you can determine that consistently triggers the error? That could be another clue. Good luck! ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Mon, Oct 24, 2016 at 6:09 PM, Larry Martellwrote: > The machines are on a local network. I access them with putty from a > windows machine, but I have to be at the site to do that. So that means when you are offsite there is no way to access either machine? Does anyone have a means to access these machines from offsite? > Yes, the C6 instance is running on the C7 machine. What could be > mis-configured? What would I check to find out? OK, so these two machines are actually the same physical hardware, correct? Do you know, is the networking between the two machines "soft", as in done locally on the machine (typically through NAT or briding)? Or is it "hard", in that you have a dedicated NIC for the host and a separate dedicated NIC for the guest, and actual cables going out of each interface and connected to a switch/hub/router? I would expect the former... If it truly is a "soft" network between the machines, then that is more evidence of a configuration error. Now, unfortunately, with what to look for: I have virtually no experience setting up C6 guests on a C7 host; at least not enough to help you troubleshoot the issue. But in general, you should be able to hit up a web search and look for howtos and other documents on setting up networking between a C7 host and its guests. That will allow you to (1) understand how it's currently setup, (2) verify if there is any misconfig, and (3) correct or change if needed. > Yes, that is potential solution I had not thought of. The issue with > this is that we have the same system installed at many, many sites, > and they all work fine. It is only this site that is having an issue. > We really do not want to have different SW running at just this one > site. Running the script on the C7 host is a change, but at least it > will be the same software as every place else. IIRC, you said this is the only C7 instance? That would mean it is already not the same as every other site. It may be conceptually the same, but "under the hood", there are a tremendous number of changes between C6 and C7. Effectively every single package is different, from the kernel all the way to trivial userspace tools. > netperf is not installed. Again, if you can use putty (which is ssh) to access these systems, you implicitly have the ability to upload files (i.e. packages) to the systems. A simple tool like netperf should have few (if any) dependencies, so you don't have to mess with mirroring the whole centos repo. Just grab the netperf rpm file from wherever, then use scp (I believe it's called pscp when part of the Putty package) to copy to your servers, yum install and start testing. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
Another alternative idea: you probably won't be comfortable with this, but check out systemd-nspawn. There are lots of examples online, and even I wrote about how I use it: http://raw-sewage.net/articles/fedora-under-centos/ This is unfortunately another "sysadmin" solution to your problem. nspawn is the successor to chroot, if you are at all familiar with that. It's kinda-sorta like running a system-within-a-system, but much more lightweight. The "slave" systems share the running kernel with the "master" system. (I could say the "guest" and "host" systems, but those are virtual machine terms, and this is not a virtual machine.) For your particular case, the main benefit is that you can natively share filesystems, rather than use NFS to share files. So, it's clear you have network capability between the C6 and C7 systems. And surely you must have ssh installed on both systems. Therefore, you can transfer files between C6 and C7. So here's a way you can use systemd-nspawn to get around trying to install all the extra libs you need on C7: 1. On the C7 machine, create a systemd-nspawn container. This container will "run" C6. 2. You can source everything you need from the running C6 system directly. Heck, if you have enough disk space on the C7 system, you could just replicate the whole C6 tree to a sub-directory on C7. 3. When you configure the C6 nspawn container, make sure you pass through the directory structure with these FTP'ed files. Basically you are substituting systemd-nspawn's bind/filesystem pass-through mechanism in place of NFS. With that setup, you can "probably" run all the C6 native stuff under C7. This isn't guaranteed to work, e.g. if your C6 programs require hooks into the kernel, it could fail, because now you're running on a different kernel... but if you only use userspace libraries, you'll probably be OK. But I was actually able to get HandBrake, compiled for bleeding-edge Ubuntu, to work within a C7 nspawn container. That probably trades one bit of complexity (NFS) for another (systemd-nspawn). But just throwing it out there if you're completely stuck. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Mon, Oct 24, 2016 at 2:42 PM, Larry Martellwrote: >> At any rate, what I was looking at was seeing if there was any way to >> simplify this process, and cut NFS out of the picture. If you need >> only to push these files around, what about rsync? > > It's not just moving files around. The files are read, and their > contents are loaded into a MySQL database. On what server does the MySQL database live? > This site is not in any way connected to the internet, and you cannot > bring in any computers, phones, or media of any kind. There is a > process to get machines or files in, but it is onerous and time > consuming. This system was set up and configured off site and then > brought on site. But clearly you have a means to log in to both the C6 and C7 servers, right? Otherwise, how would be able to see these errors, check top/sar/free/iostat/etc? And if you are logging in to both of these boxes, I assume you are doing so via ssh? Or are you actually physically sitting in front of these machines? If you have ssh access to these machines, then you can trivially copy files to/from them. If ssh is installed and working, then scp should also be installed and working. Even if you don't have scp, you can use tar over ssh to the same effect. It's ugly, but doable, and there are examples online for how to do it. Also: you made a couple comments about these machines, it looks like the C7 box (FTP server + NFS server) is running bare metal (i.e. not a virtual machine). The C6 instance (NFS client) is virtualized. What host is the C6 instance? Is the C6 instance running under the C7 instance? I.e., are both machines on the same physical hardware? If that is true, then your "network" (at least the one between C7 and C6) is basically virtual, and to have issues like this on the same physical box is certainly indicative of a mis-configuration. > To run the script on the C7 NFS server instead of the C6 NFS client > many python libs will have to installed. I do have someone off site > working on setting up a local yum repo with what I need, and then we > are going to see if we can zip and email the repo and get it on site. > But none of us are sys admins and we don't really know what we're > doing so we may not succeed and it may take longer then I will be here > in Japan (I am scheduled to leave Saturday). Right, but my point is you can write your own custom script(s) to copy files from C7 to C6 (based on rsync or ssh), do the processing on C6 (DB loading, whatever other processing), then move back to C7 if necessary. You said yourself you are a programmer not a sysadmin, so change the nature of the problem from a sysadmin problem to a programming problem. I'm certain I'm missing something, but the fundamental architecture doesn't make sense to me given what I understand of the process flow. Were you able to run some basic network testing tools between the C6 and C7 machines? I'm interested specifically in netperf, which does round trip packet testing, both TCP and UDP. I would look for packet drops with UDP, and/or major performance outliers with TCP, and/or any kind of timeouts with either protocol. How is name resolution working on both machines? Do you address machines by hostname (e.g., "my_c6_server"), or explicitly by IP address? Are you using DNS or are the IPs hard-coded in /etc/hosts? To me it still "smells" like a networking issue... -Matt ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Sun, Oct 23, 2016 at 8:02 AM, Larry Martellwrote: >> To be clear: the python script is moving files on the same NFS file >> system? E.g., something like >> >> mv /mnt/nfs-server/dir1/file /mnt/nfs-server/dir2/file >> >> where /mnt/nfs-server is the mount point of the NFS server on the >> client machine? > > Correct. > >> Or are you moving files from the CentOS 7 NFS server to the CentOS 6 NFS >> client? > > No the files are FTP-ed to the CentOS 7 NFS server and then processed > and moved on the CentOS 6 NFS client. I apologize if I'm being dense here, but I'm more confused on this data flow now. Your use of "correct" and "no" seems to be inconsistent with your explanation. Sorry! At any rate, what I was looking at was seeing if there was any way to simplify this process, and cut NFS out of the picture. If you need only to push these files around, what about rsync? > The problem doing that is the files are processed and loaded to MySQL > and then moved by a script that uses the Django ORM, and neither > django, nor any of the other python packages needed are installed on > the server. And since the server does not have an external internet > connection (as I mentioned in my reply to Mark) getting it set up > would require a large amount of effort. ...right, but I'm pretty sure rsync should be installed on the server; I believe it's default in all except the "minimal" setup profiles. Either way, it's trivial to install, as I don't think it has any dependencies. You can download the rsync rpm from mirror.centos.org, then scp it to the server, then install via yum. And Python is definitely installed (requirement for yum) and Perl is probably installed as well, so with rsync plus some basic Perl/Python scripting you can create your own mover script. Actually, rsync may not even be necessary, scp may be sufficient for your purposes. And scp should definitely be installed. > Also, we have this exact same setup on over 10 other systems, and it > is only this one that is having a problem. The one difference with > this one is that the sever is CentOS7 - on all the other systems both > the NFS server and client are CentOS6. >From what you've described so far, with what appears to be a relatively simple config, C6 or C7 "shouldn't" matter. However, under the hood, C6 and C7 are quite different. > The python script checks the modification time of the file, and only > if it has not been modified in more then 2 minutes does it process it. > Otherwise it skips it and waits for the next run to potentially > process it. Also, the script can tell if the file is incomplete in a > few different ways. So if it has not been modified in more then 2 > minutes, the script starts to process it, but if it finds that it's > incomplete it aborts the processing and leaves it for next time. This script runs on C7 or C6? > The hardware is new, and is in a rack in a server room with adequate > and monitored cooling and power. But I just found out from someone on > site that there is a disk failure, which happened back on Sept 3. The > system uses RAID, but I don't know what level. I was told it can > tolerate 3 disk failures and still keep working, but personally, I > think all bets are off until the disk has been replaced. That should > happen in the next day or 2, so we shall see. OK, depending on the RAID scheme and how it's implemented, there could be disk timeouts causing things to hang. > I've been watching and monitoring the machines for 2 days and neither > one has had a large CPU load, not has been using much memory. How about iostat? Also, good old "dmesg" can suggest if the system with the failed drive is causing timeouts to occur. > None on the client. On the server it has 1 dropped Rx packet. > >> Do >>> "ethtool " on both machines to make sure both are linked up >>> at the correct speed and duplex. > > That reports only "Link detected: yes" for both client and server. OK, but ethtool should also say something like: ... Speed: 1000Mb/s Duplex: Full ... For a 1gbps network. If Duplex is reported as "half", then that is definitely a problem. Using netperf is further confirmation of whether or not your network is functioning as expected. > sar seems to be running, but I can only get it to report on the > current day. The man page shows start and end time options, but is > there a way to specify the stand and end date? If you want to report on a day in the past, you have to pass the file argument, something like this: sar -A -f /var/log/sa/sa23 -s 07:00:00 -e 08:00:00 That would show you yesterday's data between 7am and 8am. The files in /var/log/sa/saXX are the files that correspond to the day. By default, XX will be the day of the month. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Fri, Oct 21, 2016 at 4:14 AM, Larry Martellwrote: > We have 1 system ruining Centos7 that is the NFS server. There are 50 > external machines that FTP files to this server fairly continuously. > > We have another system running Centos6 that mounts the partition the files > are FTP-ed to using NFS. > > There is a python script running on the NFS client machine that is reading > these files and moving them to a new dir on the same file system (a mv not > a cp). To be clear: the python script is moving files on the same NFS file system? E.g., something like mv /mnt/nfs-server/dir1/file /mnt/nfs-server/dir2/file where /mnt/nfs-server is the mount point of the NFS server on the client machine? Or are you moving files from the CentOS 7 NFS server to the CentOS 6 NFS client? If the former, i.e., you are moving files to and from the same system, is it possible to completely eliminate the C6 client system, and just set up a local script on the C7 server that does the file moves? That would cut out a lot of complexity, and also improve performance dramatically. Also, what is the size range of these files? Are they fairly small (e.g. 10s of MB or less), medium-ish (100s of MB) or large (>1GB)? > Almost daily this script hangs while reading a file - sometimes it never > comes back and cannot be killed, even with -9. Other times it hangs for 1/2 > hour then proceeds on. Timeouts relating to NFS are the worst. > Coinciding with the hanging I see this message on the NFS server host: > > nfsd: peername failed (error 107) > > And on the NFS client host I see this: > > nfs: V4 server returned a bad sequence-id > nfs state manager - check lease failed on NFSv4 server with error 5 I've been wrangling with NFS for years, but unfortunately those particular messages don't ring a bell. The first thing that came to my mind is: how does the Python script running on the C6 client know that the FTP upload to the C7 server is complete? In other words, if someone is uploading "fileA", and the Python script starts to move "fileA" before the upload is complete, then at best you're setting yourself up for all kinds of confusion, and at worst file truncation and/or corruption. Making a pure guess about those particular errors: is there any chance there is a network issue between the C7 server and the C6 client? What is the connection between those two servers? Are they physically adjacent to each other and on the same subnet? Or are they on opposite ends of the globe connected through the Internet? Clearly two machines on the same subnet, separated only by one switch is the simplest case (i.e. the kind of simple LAN one might have in his home). But once you start crossing subnets, then routing configs come into play. And maybe you're using hostnames rather than IP addresses directly, so then name resolution comes into play (DNS or /etc/hosts). And each switch hop you add requires that not only your server network config needs to be correct, but also your switch config needs to be correct as well. And if you're going over the Internet, well... I'd probably try really hard to not use NFS in that case! :) Do you know if your NFS mount is using TCP or UDP? On the client you can do something like this: grep nfs /proc/mounts | less -S And then look at what the "proto=XXX" says. I expect it will be either "tcp" or "udp". If it's UDP, modify your /etc/fstab so that the options for that mountpoint include "proto=tcp". I *think* the default is now TCP, so this may be a non-starter. But the point is, based purely on the conjecture that you might have an unreliable network, TCP would be a better fit. I hate to simply say "RTFM", but NFS is complex, and I still go back and re-read the NFS man page ("man nfs"). This document is long and very dense, but it's worth at least being familiar with its content. > The first client message is always at the same time as the hanging starts. > The second client message comes 20 minutes later. > The server message comes 4 minutes after that. > Then 3 minutes later the script un-hangs (if it's going to). In my experience, delays that happen on consistent time intervals that are on the order of minutes tend to smell of some kind of timeout scenario. So the question is, what triggers the timeout state? > Can anyone shed any light on to what could be happening here and/or what I > could do to alleviate these issues and stop the script from hanging? > Perhaps some NFS config settings? We do not have any, so we are using the > defaults. My general rule of thumb is "defaults are generally good enough; make changes only if you understand their implications and you know you need them (or temporarily as a diagnostic tool)". But anyway, my hunch is that there might be a network issue. So I'd actually start with basic network troubleshooting. Do an "ifconfig" on both machines: do you see any drops or interface errors? Do "ethtool " on both
[CentOS] Kerberized NFS client and slow user write performance
We seem to be increasingly hit by this bug: https://access.redhat.com/solutions/2040223 "On RHEL 6 NFS client usring kerberos (krb5), one user experiences slow write performance, another does not" You need a RH subscription to see that in its entirety. But the subject basically says it all: randomly, one or more users will be subjected to *terrible* NFS write performance that persists until reboot. There is a root cause shown, but that is cryptic to non-kernel devs; it doesn't explain from a user perspective what triggers this state. (That's why it appears to be random to me.) There is no solution or workaround given. This appears to be on a per-user + per-server basis, so a crude workaround is to migrate the user to a different server. And we do regular reboots, which somewhat hides the problem. My question to the list: has anyone else dealt with this? The link says "Solution in Progress", but that was last updated nearly a year ago. We don't have any support contracts with upstream, just the website access subscription, so I doubt RH will offer any help. Appreciate any suggestions! Thanks, Matt ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Suggestions for Config Management Tool
As others have said, in the end, it's a matter of personal preference (e.g. vim or emacs). You could spend a week reading articles and forum discussions comparing all the different tools; but until you've really used them, it will mostly be an academic exercise. Of course, the particulars of your environment might naturally lend itself to one tool or the other, so it's certainly worth spending some time getting an overview of the "idiom" of each tool. That said, we are working on moving away from dozens of little homegrown management scripts to Ansible. It just feels "right" to me, like how I would have designed such a system. I like that it's built on top of ssh. Any sysadmin should be fairly intimate with ssh, so why not build your CMS on top of a familiar tool? (But, of course, Ansible is flexible enough that you don't have to use ssh.) I might even go so far as to call it a "platform" rather than a tool. Out of the box, you can quickly get going having it do useful work by reading the docs/tutorials on the website. And just going through those exercises, you'll start to see that there's a ton of flexibility available, which is your option to exercise or not. And that perhaps is one of the drawbacks. We're actually somewhat in "analysis paralysis" mode with Ansible right now. Because there is so much flexibility, we are constantly second-guessing ourselves the best way to implement our fairly complex and diverse environments. In particular, how to group configuration "profiles". E.g., this server needs to be a DNS master, this server needs to be a DNS slave, this server needs MySQL + DNS slave, this server needs these packages installed, this server needs those packages but not these, etc etc. But I always prefer a tool with too much flexibility over something that forces you in to a specific way of doing things: that makes it our problem, not the tool's. The only other one I have any experience with is CFEngine. I tried---and I mean really tried---to get something going with CFEngine3. I just couldn't get my head around it. The wacky DSL it uses for expressing configs just wasn't intuitive to me; the whole bootstrapping processes seemed to be overly-complex; I found the documentation managed to be lengthy yet still lack real substance. By contrast: everything I've wanted to do in Ansible I was able to do quickly (and usually in several ways); on the client side, the only thing needed for an Ansible bootstrap is ssh; and the docs for Ansible have met or exceeded all expectations. My colleague and I were even able to quickly hack on some of the Ansible Python code to add some functionality we wanted. At least the pieces we looked at appeared to be quite straightforward. I have 15 years of C/C++ programming experience and wouldn't even consider messing with the CFEngine code. Maybe it's fine, but the complexity of the rest of the system is enough to scare me away from looking at the source. To be fair, it was *many* years ago that I looked at CFE3; maybe many of my issues have since been addressed. But, at this point, Ansible checks all my boxes, so that's where we're staying. Again, that's just my taste/experience. If you have the time, I'd spin up some VMs and play with the different tools. Try to implement some of your key items, see how hard/easy they are. On Thu, May 12, 2016 at 8:27 AM, Fabian Arrotinwrote: > On 12/05/16 10:21, James Hogarth wrote: >> On 12 May 2016 at 08:22, Götz Reinicke - IT Koordinator < >> goetz.reini...@filmakademie.de> wrote: >> >>> Hi, >>> >>> we see a growing need for a better Configuration management for our >>> servers. >>> >>> Are there any known good resources for a comparison of e.g. Puppet, >>> Chef, Ansible etc? >>> >>> What would you suggest and why? :) >>> >>> >>> >> >> Puppet is great for central control with automatic runs making systems >> right and keeping them in line, it's not an orchestration tool though - >> however it's commonly supplemented with something like rundeck and/or >> mcollective to assist here. >> >> Chef is great for a ruby house - you'll need to brush up on your ruby as >> writing cookbooks is heavily tied to the language. Historically it was very >> debian focused with issues like selinux problems. I believe these have been >> generally resolved though. >> >> Ansible is a great orchestration tool and excellent for going from base to >> a configured system. It is less of a tool to keep things inline with a base >> however with no central automated runs (ignoring Tower which is not FOSS >> yet). >> >> Ansible is also much simpler to get into given the tasks are just like >> following through a script for defining how to make a system, as opposed to >> learning an actual DSL like required for understanding puppet modules. >> >> There's a growing pattern of using ansible for orchestration alongside >> puppet for definitions as well (there's a specific ansible module to carry >> out a puppet
[CentOS] tune2fs: Filesystem has unsupported feature(s) while trying to open
I have an ext4 filesystem for which I'm trying to use "tune2fs -l". Here is the listing of the filesystem from the "mount" command: # mount | grep share /dev/mapper/VolGroup_Share-LogVol_Share on /share type ext4 (rw,noatime,nodiratime,usrjquota=aquota.user,jqfmt=vfsv0,data=writeback,nobh,barrier=0) When I try to run "tune2fs" on it, I get the following error: # tune2fs -l /dev/mapper/VolGroup_Share-LogVol_Share tune2fs 1.41.12 (17-May-2010) tune2fs: Filesystem has unsupported feature(s) while trying to open /dev/mapper/VolGroup_Share-LogVol_Share Couldn't find valid filesystem superblock. This filesystem was created on this system (i.e. not imported from another system). I have other ext4 filesystems on this server, and they all work with "tune2fs -l". Basic system info: # rpm -qf `which tune2fs` e2fsprogs-1.41.12-18.el6.x86_64 # cat /etc/redhat-release CentOS release 6.5 (Final) # uname -a Linux lnxutil8 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux I did a little web searching on this, most of the hits were for much older systems, where (for example) the e2fsprogs only supported up to ext3, but the user had an ext4 filesystem. Obviously that's not the case here. In other words, the filesystem was created with the mkfs.ext4 binary from the same e2fsprogs package as the tune2fs binary I'm trying to use. Anyone ever seen anything like this? Thanks! ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Just need to vent
I haven't used gnome3, or any Linux desktop in earnest for a long time... But I used to be semi-obsessed with tweaking and configuring various Linux desktops. And back when I was doing that, there were dozens of desktop programs available, from super lightweight bare bones window managers, to full blown desktop environments that do everything under the sun (and of course, everything in between). So my question is: while gnome3 might not float your boat, why not try one of the countless other desktops? It's all open source... FWIW, I was never a fan of full blown desktop environments like kde/gnome simply because I had a preference for lightweight, standalone window managers. My favorites were fluxbox and icewm. Besides those, off the top of my head, I know of: blackbox, openbox, Joe's wm, window maker, and enlightenment 16 in the simple/lightweight window manager category. Xfce has already been mentioned, and there's also lde and the latest enlightenment in the full-on desktop environment category. A little elbow grease may be required, but, I'm certain there's *a* Linux gui out there for everyone. On Jan 24, 2016 12:20, "Joacim Melin"wrote: > > > On 24 Jan 2016, at 17:45, Peter Duffy wrote: > > > > On Sat, 2016-01-23 at 20:27 -0600, Frank Cox wrote: > >> On Sat, 23 Jan 2016 20:05:02 -0500 > >> Mark LaPierre wrote: > >> > >>> The main reason I'm still using, nearly obsolete, CentOS 6 is because I > >>> don't want to have to deal with Gnome 3. > >> > >> Install Mate on Centos 7 and you never have to touch Gnome 3. I did, > >> and my desktops don't look or work any different today than they did > >> under Centos 6. > >> > > > > Trouble is that when you go from 6 to 7, you also have the delights of > > systemd and grub 2 to contend with. > > > > I'm also still using CentOS 6, and currently have no desire to > > "upgrade". I'm still in shock after trying to upgrade to Red Hat 7 at > > work, and after the upgrade (apart from being faced with the gnome3 > > craziness) finding that many of the admin commands either didn't work, > > or only worked partially via a wrapper. (And the added insult that when > > I shut down the box, it gave a message something like: "shutdown status > > asserted" and then hung, so that it had to be power-cycled. Then when it > > came back up, it went through all the fs checks as though it had shut > > down ungracefully.) I allowed some of the senior developers to try the > > box themselves for a while, and based on their findings, it was decided > > to switch to Ubuntu (which (at least then) didn't use systemd,) together > > with Mate and XFCE. > > > > Similarly with others who have commented, I simply cannot understand why > > the maintainers of crucial components in linux have this thing about > > making vast changes which impact (usually adversely) on users and > > admins, without (apparently) any general discussion or review of the > > proposed changes. What happened to RFCs? Maybe it's a power thing - we > > can do it, so we're gonna do it, and if ya don't like it, tough! > > > > It would be very interesting to know how many other users are still on > > CentOS/Red Hat 6 as a result of reluctance to enjoy all the - erm - > > improvements in 7. Maybe it's time to fork CentOS 6 and make it look and > > behave like 7 without systemd (or even better, with some way of > > selecting the init methodology at install-time and afterwards), and with > > gnome2 (or a clear choice between 2 and 3). Call it DeCentOS. > > > > > > I'm still on 6.7 and have no plans to upgrade my 20+ servers running it. > KVM runs fine, all my services runs fine. > Everything is stable, fast enough and I can find my way around a CentOS > 6.x system like the palm of my hand. > > I tried installing CentOS 7 when it was released without knowing about all > the changes. I spent about an hour trying to understand what had happened > and where things where located. And with "trying" I mean searching, > googling and just feeling really frustrated. > > I then realised that it was simply not for me - lots of (IMHO unnecessary) > changes had been made and I guess when the time comes to really upgrade my > servers I will go with Ubuntu, FreeBSD or whatever seems to be the the best > option. > > I'm sure there are technical reasons to upgrade to CentOS 7, I'm yet to be > bothered to find out though since it's damn near impossible to actually get > work done with it installed. > > A fork of CentOS 6 would be very, very, very interesting to run from my > point of view. > > Joacim > > > > > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] HDD badblocks
That's strange, I expected the SMART test to show some issues. Personally, I'm still not confident in that drive. Can you check cabling? Another possibility is that there is a cable that has vibrated into a marginal state. Probably a long shot, but if it's easy to get physical access to the machine, and you can afford the downtime to shut it down, open up the chassis and re-seat the drive and cables. Every now and then I have PCIe cards that work fine for years, then suddenly disappear after a reboot. I re-seat them and they go back to being fine for years. So I believe vibration does sometimes play a role in mysterious problems that creep up from time to time. On Mon, Jan 18, 2016 at 5:39 AM, Alessandro Baggiwrote: > Il 18/01/2016 12:09, Chris Murphy ha scritto: >> >> What is the result for each drive? >> >> smartctl -l scterc >> >> >> Chris Murphy >> ___ >> CentOS mailing list >> CentOS@centos.org >> https://lists.centos.org/mailman/listinfo/centos >> . >> > SCT Error Recovery Control command not supported > > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] HDD badblocks
Have you ran a "long" smart test on the drive? Smartctl -t long device I'm not sure what's going on with your drive. But if it were mine, I'd want to replace it. If there are issues, that long smart check ought to turn up something, and in my experience, that's enough for a manufacturer to do a warranty replacement. On Jan 17, 2016 11:00, "Alessandro Baggi"wrote: > Hi list, > I've a notebook with C7 (1511). This notebook has 2 disk (640 GB) and I've > configured them with MD at level 1. Some days ago I've noticed some > critical slowdown while opening applications. > > First of all I've disabled acpi on disks. > > > I've checked disk for badblocks 4 consecutive times for disk sda and sdb > and I've noticed a strange behaviour. > > On sdb there are not problem but with sda: > > 1) First run badblocks reports 28 badblocks on disk > 2) Second run badblocks reports 32 badblocks > 3) Third reports 102 badblocks > 4) Last run reports 92 badblocks. > > > Running smartctl after the last badblocks check I've noticed that > Current_Pending_Sector was 32 (not 92 as badblocks found). > > To force sector reallocation I've filled the disk up to 100%, runned again > badblocks and 0 badblocks found. > Running again smartctl, Current_Pending_Sector 0 but Reallocated_Event > Count = 0. > > Why each consecutive run of badblocks reports different results? > Why smartctl does not update Reallocated_Event_Count? > Badblocks found on sda increase/decrease without a clean reason. This > behaviuor can be related with raid (if a disk had badblocks this badblock > can be replicated on second disk?)? > > What other test I can perform to verify disks problems? > > Thanks in advance. > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Intel SSD
I always tell vendors I'm using RHEL, even though we're using CentOS. If you say CentOS, some vendors immediately throw up their hands and say "unsupported" and then won't even give you the time of day. A couple tricks for fooling tools into thinking they are on an actual RHEL system: 1. Modify /etc/redhat-release to say RedHat Enterprise Linux or whatever the actual RHEL systems have 2. Similarly modify /etc/issue Another tip that has proven successful: run the vendor tool under strace. Sometimes you can get an idea of what it's trying to do and why it's failing. This is exactly what we did to determine why a vendor tool wouldn't work on CentOS. We had modified /etc/redhat-release (as in (1) above), but forgot about /etc/issue. Strace showed the program existing immediately after an open() call to /etc/issue. Good luck! On Wed, Nov 18, 2015 at 9:24 AM, Michael Hennebrywrote: > On Wed, 18 Nov 2015, Birta Levente wrote: > >> I have a supermicro server, motherboard is with C612 chipset and beside >> that with LSI3108 raid controller integrated. >> Two Intel SSD DC S3710 200GB. >> OS: Centos 7.1 up to date. >> >> My problem is that the Intel SSD Data Center Tool (ISDCT) does not >> recognize the SSD drives when they connected to the standard S-ATA ports on >> the motherboard, but through the LSI raid controller is working. >> >> Does somebody know what could be the problem? >> >> I talked to the Intel support and they said the problem is that Centos is >> not supported OS ... only RHEL 7. >> But if not supported should not work on the LSI controlled neither. > > > Perhaps the tool looks for the string RHEL. > My recollection is that when IBM PC's were fairly new, > IBM used that trick with some of its software. > To work around that, some open source developers used the string "not IBM". > I think this was pre-internet, so google might not work. > > If it's worth the effort, you might make another "CentOS" distribution, > but call it "not RHEL". > > -- > Michael henne...@web.cs.ndsu.nodak.edu > "Sorry but your password must contain an uppercase letter, a number, > a haiku, a gang sign, a heiroglyph, and the blood of a virgin." > -- someeecards > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Running Fedora under CentOS via systemd-nspawn?
I actually built HandBrake 0.10.2 (the latest) under C7 (using a CentOS 7 nspawn container so as not to pollute the main system with the dozens of deps I installed). Full details here if you're interested: http://raw-sewage.net/articles/fedora-under-centos/ The problem with the newer version of HandBrake is that it requires (a very recent version of) gtk3, which in turn has several other deps that need to be upgraded on C7. But I worked through all that, and can provide all the spec files if anyone wants. Anyway, the HandBrake problem is solved for me (in possibly multiple ways). But I'm just fascinated by the possibilities of nspawn, and wondering how far one can take it before instabilities are introduced. Consider how many people out there have similar problems as me: want to run CentOS for stability/reliability/vendor support, but also want some bleeding-edge software that's only available on Fedora (or Ubuntu or Arch). If it's "safe" to run these foreign distributions under CentOS via nspawn, then I think that's a simple solution. Virtual Machines are of course a possible solution, but they seem overkill for this class of problem. And not to mention. possibly inefficient---something like HandBrake should benefit from running on bare metal, rather than under a virtualized CPU. On Wed, Nov 18, 2015 at 1:11 PM, Lamar Owen <lo...@pari.edu> wrote: > On 11/17/2015 12:39 PM, Matt Garman wrote: >> >> Now I have a need for a particular piece of software: HandBrake. I >> found this site[1] that packages it for both Fedora and CentOS. But >> the CentOS version is a little older, as the latest HandBrake requires >> gtk3. The latest version is available for Fedora however. >> > Hmm, Nux Dextop (li.nux.ro) has HandBrake 0.9.9 for C7, but not yet 0.10.2. > Nux! is around this list and might be able to shed light on what is needed > for 0.10.2. > > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] Running Fedora under CentOS via systemd-nspawn?
tl;dr - Is anybody "running" a Fedora system via systemd-nspawn under CentOS? Long version: Before CentOS 7, I used chroot to create "lightweight containers" where I could cleanly add extra repos and/or software without the risk of "polluting" my main system (and potentially ending up in dependency hell). The primary driver for this was MythTV, which has dozens of deps that span multiple repos. Without "containing" the MythTV installation within a chroot environment, I would inevitably lead to conflicts when doing a yum update. When I upgraded to CentOS 7, I found out that systemd-nspawn is "chroot on steroids". After figuring it all out, I replicated my MythTV "container", and things were great. Now I have a need for a particular piece of software: HandBrake. I found this site[1] that packages it for both Fedora and CentOS. But the CentOS version is a little older, as the latest HandBrake requires gtk3. The latest version is available for Fedora however. So I thought, what if I could "run" Fedora under systemd-nspawn. Well, I definitely *can* do it. I copied the base Fedora filesystem layout off the Live CD, then booted into it via systemd-nspawn. I was able to add repos (including the one for HandBrake), and actually install then run the HandBrake GUI. So while this does work, I'm wondering if it's safe? I'm thinking that at least some of the Fedora tools assume that they are running under a proper Fedora kernel, whereas in my scheme, they are running under a CentOS kernel. I'm sure there have been changes to the kernel API between the CentOS kernel and the Fedora kernel. Am I risking system stability by doing this? Anyone have any thoughts or experience doing something like this, i.e. running "foreign" Linux distros under CentOS via systemd-nspawn? What if I tried to do this with Debian or Arch or Gentoo? [1] http://negativo17.org/handbrake/ ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Screen
If you're just getting starting with a screen multiplexer, I'd suggest starting with tmux. My understanding is that GNU screen has effectively been abandoned. I used GNU screen for at least 10 years, and recently switched to tmux. As someone else said, in GNU screen, if you want to send ctrl-a to your application (e.g. shell or emacs), you can do ctrl-a followed by a "naked" a. I found this becomes so second nature, for the rare time I'm not in screen/tmux, I habitually do the Ctrl-a a sequence! tmux's default "action" sequence is Ctrl-b. Even without my history of Ctrl-a muscle memory, I think I'd find Ctrl-b awkward. I briefly tried to get used to it so I could live without a custom tmux config file, but just couldn't do it. So, here's my small ~/.tmux.conf file: # remap Ctrl-b to Ctrl-a (to emulate behavior of GNU screen) unbind C-b set -g prefix C-a bind C-a send-prefix # use vi-like keybindings set-window-option -g mode-keys vi # emulate GNU screen's Ctrl-a a sequence to jump to beginning of # line bind a send-prefix On Fri, Oct 30, 2015 at 6:39 AM, xaoswrote: > Andrew, > > Don't do it man. Don't remap screen key sequences. > > I had the same issue. This is how I ultimately solved it. > I mentally trained myself to think of screen > as a room that I need to do a Ctrl-A in order to get in there. > > So, for bash, It is NOT a big deal anyway. Train your fingers to do a > Ctrl-A then a > > It is just one extra keystroke. > > I got used to it within a week. > > -George > On 10/30/15 7:13 AM, Scott Robbins wrote: >> >> On Fri, Oct 30, 2015 at 10:53:29AM +0100, Andrew Holway wrote: >>> >>> Hey >>> >>> I like to use Ctrl+A and Ctrl+E a lot to navigate my insane big bash one >>> liners but this is incompatible with Screen which has a binding to >>> Ctrl-A. >>> Is it possible to move the screen binding so I can have the best of both >>> worlds? >> >> If you only make simple use of screen, then there's always tmux. It uses >> ctl+b by default, and one of the reasons is the issue you mention. >> >> (If you have a lot of complex uses of screen, then it becomes a bigger >> deal >> to learn the new keyboard shortcuts, but many people just use it's attach >> and deteach feature, and relearning those in tmux takes a few minutes.) >> >> If you are interested in trying it, I have my own very simple page with >> links to a better page at http://srobb.net/screentmux.html >> > > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS 6 gcc is a bit old
Take a look at Devtoolset, I think this will give you what you want: https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/ On Mon, Jun 29, 2015 at 1:56 PM, Michael Hennebry henne...@web.cs.ndsu.nodak.edu wrote: gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) is a bit old. There have been major changes since then. I'd like a newer version. If I have to, I expect that I can install from source. I'd rather not. Is there a CentOS 6-compatible repository from which I can get a newer version? Does a standard CentOS 7 repository have a newer version? Does a CentOS 7-compatible repository have a newer version? It's my understanding that to compile from source, I will need to keep the gcc I have. Otherwise I would have nothing to compile the source. I expect that providing the right options will let old and new co-exist. Is ensuring that I get the right gcc when I type gcc just a matter of having the right search path for gcc? Will I need to do anything interesting to ensure that the resulting executables run using the right libraries? I've installed from source before, but never to replace an existing compiler. My concern is that if I louse things up, the mess could be very hard to fix. -- Michael henne...@web.cs.ndsu.nodak.edu SCSI is NOT magic. There are *fundamental technical reasons* why it is necessary to sacrifice a young goat to your SCSI chain now and then. -- John Woods ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] managing logins for different classes of servers
Our environment has several classes of servers, such as development, production, qa, utility, etc. Then we have all our users. There's no obvious mapping between users and server class. Some users may have access to only one class, some may span multiple classes, etc. And for maximum complexity, some classes of machines use local (i.e. /etc/passwd, /etc/shadow) authentication, others use Kerberos. With enough users and enough classes, it gets to be more than one can easily manage with a simple spreadsheet or other crude mechanism. Plus the ever-growing risk of giving a user access to a class he shouldn't have. Is there a simple centralized solution that can simplify the management of this? One caveat though is that our production class machines should not have any external dependencies. These are business-critical, so we try to minimize any single point of failure (e.g. a central server). Plus the production class machines are distributed in multiple remote locations. Any thoughts? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?
On Thu, Apr 30, 2015 at 7:31 AM, Peter van Hooft ho...@natlab.research.philips.com wrote: You may want to try reducing sunrpc.tcp_max_slot_table_entries . In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries = 16 In CentOS 6, this number is dynamic with a maximum of sunrpc.tcp_max_slot_table_entries which by default has a value of 65536. We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc tcp_max_slot_table_entries=128 Make that /etc/modprobe.d/sunrpc.conf, of course. This appears to be the smoking gun we were looking for, or at least a significant piece of the puzzle. We actually tried this early on in our investigation, but were changing it via sysctl, which apparently has no effect. Your email convinced me to try again, but this time configuring the parameters via modprobe. In our case, 128 was still too high. So we dropped it all the way down to 16. Our understanding is that 16 is the CentOS 5 value. What we're seeing is now our apps are starved for data, so looks like we might have to nudge it up. In other words, there's either something else at play which we're not aware of, or the meaning of that parameter is different between CentOS 5 and CentOS 6. Anyway, thank you very much for the suggestion. You turned on the light at the end of the tunnel! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?
On Wed, Apr 29, 2015 at 10:51 AM, m.r...@5-cent.us wrote: The server in this case isn't a Linux box with an ext4 file system - so that won't help ... What kind of filesystem is it? I note that xfs also has barrier as a mount option. The server is a NetApp FAS6280. It's using NetApp's filesystem. I'm almost certain it's none of the common Linux ones. (I think they call it WAFL IIRC.) Either way, we do the NFS mount read-only, so write barriers don't even come into play. E.g., with your original example, if we unzipped something, we'd have to write to the local disk. Furthermore, in low load situations, the NetApp read latency stays low, and the 5/6 performance is fairly similar. It's only when the workload gets high, and it turn this aggressive demand is placed on the NetApp, that we in turn see overall decreased performance. Thanks for the thoughts! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?
We have a compute cluster of about 100 machines that do a read-only NFS mount to a big NAS filer (a NetApp FAS6280). The jobs running on these boxes are analysis/simulation jobs that constantly read data off the NAS. We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5. We did a piecemeal upgrade, usually upgrading five or so machines at a time, every few days. We noticed improved performance on the CentOS 6 boxes. But as the number of CentOS 6 boxes increased, we actually saw performance on the CentOS 5 boxes decrease. By the time we had only a few CentOS 5 boxes left, they were performing so badly as to be effectively worthless. What we observed in parallel to this upgrade process was that the read latency on our NetApp device skyrocketed. This in turn caused all compute jobs to actually run slower, as it seemed to move the bottleneck from the client servers' OS to the NetApp. This is somewhat counter-intuitive: CentOS 6 performs faster, but actually results in net performance loss because it creates a bottleneck on our centralized storage. All indications are that CentOS 6 seems to be much more aggressive in how it does NFS reads. And likewise, CentOS 5 was very polite, to the point that it basically got starved out by the introduction of the 6.5 boxes. What I'm looking for is a deep dive list of changes to the NFS implementation between CentOS 5 and CentOS 6. Or maybe this is due to a change in the TCP stack? Or maybe the scheduler? We've tried a lot of sysctl tcp tunings, various nfs mount options, anything that's obviously different between 5 and 6... But so far we've been unable to find the smoking gun that causes the obvious behavior change between the two OS versions. Just hoping that maybe someone else out there has seen something like this, or can point me to some detailed documentation that might clue me in on what to look for next. Thanks! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?
On Wed, Apr 29, 2015 at 10:36 AM, Devin Reade g...@gno.org wrote: Have you looked at the client-side NFS cache? Perhaps the C6 cache is either disabled, has fewer resources, or is invalidating faster? (I don't think that would explain the C5 starvation, though, unless it's a secondary effect from retransmits, etc.) Do you know where the NFS cache settings are specified? I've looked at the various nfs mount options. Anything cache-related appears to be the same between the two OSes, assuming I didn't miss anything. We did experiment with the noac mount option, though that had no effect in our tests. FWIW, we've done a tcpdump on both OSes, performing the same tasks, and it appears that 5 actually has more chatter. Just looking at packet counts, 5 has about 17% more packets than 6, for the same workload. I haven't dug too deep into the tcpdump files, since we need a pretty big workload to trigger the measurable performance discrepancy. So the resulting pcap files are on the order of 5 GB. Regarding the cache, do you have multiple mount points on a client that resolve to the same server filesystem? If so, do they have different mount options? If so, that can result in multiple caches instead of a single disk cache. The client cache can also be bypassed if your application is doing direct I/O on the files. Perhaps there is a difference in the application between C5 and C6, including whether or not it was just recompiled? (If so, can you try a C5 version on the C6 machines?) No multiple mount points to the same server. No application differences. We're still compiling on 5, regardless of target platform. If you determine that C6 is doing aggressive caching, does this match the needs of your application? That is, do you have the situation where the client NFS layer does an aggressive read-ahead that is never used by the application? That was one of our early theories. On 6, you can adjust this via /sys/class/bdi/X:Y/read_ahead_kb (use stat on the mountpoint to determine X and Y). This file doesn't exist on 5. But we tried increasing and decreasing it from the default (960), and didn't see any changes. Are C5 and C6 using the same NFS protocol version? How about TCP vs UDP? If UDP is in play, have a look at fragmentation stats under load. Yup, both are using tcp, protocol version 3. Are both using the same authentication method (ie: maybe just UID-based)? Yup, sec=sys. And, like always, is DNS sane for all your clients and servers? Everything (including clients) has proper PTR records, consistent with A records, et al? DNS is so fundamental to everything that if it is out of whack you can get far-reaching symptoms that don't seem to have anything to do with DNS. I believe so. I wouldn't bet my life on it. But there were certainly no changes to our DNS before, during or since the OS upgrade. You may want to look at NFSometer and see if it can help. Haven't seen that, will definitely give it a try! Thanks for your thoughts and suggestions! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS 7 NFS client problems
What does your /etc/idmapd.conf look like on the server side? I fought with this quite a bit a while ago, but my use case was a bit different, and I was working with CentOS 5 and 6. Still, the kicker for me was updating the [Translation] section of /etc/idmapd.conf. Mine looks like this: [Translation] Method = nsswitch GSS-Methods = nsswitch,static You said you're not using Kerberos or LDAP, so I'm guessing you can leave out the GSS-Methods line entirely, and make your Method line nsswitch,static. Furthermore, in my /etc/idmapd.conf file, I have a [Static] section which, according to my comments, maps GSS-authenticated names to local user names. So mine looks kind of like this: [Static] someuser@REALM = localuser Again, since you're not using GSS, I'm not sure if you can get away with something like [Static] joe = joe But it's probably worth trying/experimenting. I hope that can be of some help! On Thu, Apr 23, 2015 at 3:11 PM, Devin Reade g...@gno.org wrote: #define TL;DR Despite idmapd running, usernames/IDs don't get mapped properly. Looking for a workaround. #undef TL;DR I'm trying to get a new CentOS 7.1 workstation running, and having some problems with NFS filesystems. The server is a fully patched CentOS 6 server. On the NFS filesystem, there are two subdirectories owned by a regular user (joe). (There are actually more and by multiple users, but I'll just show the two.) That user exists on both the NFS server and this CentOS 7 NFS client. However, the user on the client machine is unable to perform various operations. (The operations work when logged into the server.) $ whoami joe $ cd /nfs $ ls -l drwx--. 6 joejoe 4096 Apr 23 11:20 one drwxr-xr-x. 4 joejoe 4096 Dec 14 2011 two $ cd one one: Permission denied. $ cd two $ ls subdir1 subdir2 $ touch testfile touch: cannot touch testfile: Permission denied mount(1) shows that the filesystem is mounted rw. The server has it exported rw to the entire subnet. Other machines (CentOS 5) mount the same filesystems without a problem. Looks a lot like an idmapd issue, right? On the server: # id joe uid=501(joe) gid=501(joe) groups=501(joe) Back on the client: $ ps auxww | grep idmap | grep -v grep $ id joe uid=1000(joe) gid=1000(joe) groups=1000(joe) $ cd /nfs $ ls -n drwx--. 6 1000 1000 4096 Apr 23 11:20 one drwxr-xr-x. 4 1000 1000 4096 Dec 14 2011 two So it looks like even though the name/UID mapping is correct even though the idmapd daemon isn't running on the client. (It looks like CentOS7 only starts idmapd when it's running an NFS *server*.) # systemctl list-units | grep nfs nfs.mountloaded active mounted /nfs proc-fs-nfsd.mount loaded active mounted NFSD configuration filesystem var-lib-nfs-rpc_pipefs.mount loaded active mounted RPC Pipe File System nfs-config.service loaded active exitedPreprocess NFS configuration nfs-client.targetloaded active activeNFS client services The behavior was tested again with SELinux in permissive mode; no change. Splunking a bit more shows some similar behavior for other distros: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/966734 https://bugzilla.linux-nfs.org/show_bug.cgi?id=226 Yep, this is a situation where LDAP and Kerberos aren't in play. And the CentOS 5, CentOS 6, and other UNIXen boxes are using consistent UID/GID mappings. However, CentOS7 (well, RHEL7) changed the minimum UID/GID for regular accounts, so when the account was created on the latter, the UID is out of sync. So much for idmapd (without the fixes involved in the above URLs). Has anyone else run into this and have a solution other than forcing UIDs to match? Devin ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] centos kernel changelog?
I'm probably overlooking something simple, but I can't seem to find a concise changelog for the rhel/centos kernel. I'm on an oldish 6.5 kernel (2.6.32-431), and I want to look at the changes and fixes for every kernel that has been released since, all the way up to the current 6.6 kernel. Anyone have a link to this? Thanks! Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] centos kernel changelog?
On Thu, Apr 9, 2015 at 8:49 AM, Johnny Hughes joh...@centos.org wrote: rpm -qp --changelog rpm-name | less NOTE: This works for any kernel RPM in any version of CentOS ... you can download the latest 6 RPM from here: http://mirror.centos.org/centos/6/updates/x86_64/Packages/ (currently kernel-2.6.32-504.12.2.el6.x86_64.rpm) Thank you Johnny, that was exactly what I needed, and immensely helpful! One more quick question: what does the number in brackets at the end of most lines represent? For example: - [fs] nfs: Close another NFSv4 recovery race (Steve Dickson) [1093922] What does the 1093922 mean? Thanks again! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS-virt] virsh list hangs / guests not starting automatically
I followed the wiki[1] to create a KVM virtual machine using bridged network on CentOS 6.5. It seemed to work fine on initial setup. (FWIW I'm trying to run a MythBuntu guest.) However, after a reboot, it doesn't auto-start the VMs. Shortly after boot, if I go into virsh, then do a list, it just hangs. Likewise, if I go into virt manager, it just hangs with the message connecting. Kernel version is: 2.6.32-431.29.2.el6.x86_64 Relevant package versions: libvirt.x86_64 0.10.2-29.el6_5.12 libvirt-client.x86_640.10.2-29.el6_5.12 libvirt-python.x86_640.10.2-29.el6_5.12 python-virtinst.noarch 0.600.0-18.el6 virt-manager.x86_64 0.9.0-19.el6 virt-top.x86_64 1.0.4-3.15.el6 virt-viewer.x86_64 0.5.6-8.el6_5.3 qemu-img.x86_64 2:0.12.1.2-2.415.el6_5.14 qemu-kvm.x86_64 2:0.12.1.2-2.415.el6_5.14 CPU is a Xeon E3-1230v3. I have the virtualization setting enabled in the BIOS. I googled on this, and saw a bunch of talk about two years ago regarding issues with the libvirt packages having a deadlock bug. But I think the versions of the relevant packages that I have installed are new enough to have fixes for that. I also happened across an earlier post to this list[2], where it seemed someone was having a similar problem. I was previously attempting to use balance-rr and 802.3ad bonding modes on my host. However, I just changed to using active-backup and the problem remains. I have in /etc/libvirt/libvirtd.conf the following three lines (the rest is stock, i.e. all comments): log_level = 2 log_filters= log_outputs=1:file:/var/log/libvirt/libvirt.log Below I posted the contents of the libvirt log file after doing a service libvirt start. Anyone ever fought this before? Thanks! [1] http://wiki.centos.org/HowTos/KVM [2] http://lists.centos.org/pipermail/centos-virt/2014-March/003722.html /var/log/libvirt/libvirt.log output: 2014-10-14 16:47:11.150+: 4657: info : libvirt version: 0.10.2, package: 29.el6_5.12 (CentOS BuildSystem http://bugs.centos.org, 2014-09-01-13:44:02, c6b8.bsys.dev.centos.org) 2014-10-14 16:47:11.150+: 4657: info : virNetlinkEventServiceStart:517 : starting netlink event service with protocol 0 2014-10-14 16:47:11.151+: 4657: info : virNetlinkEventServiceStart:517 : starting netlink event service with protocol 15 2014-10-14 16:47:11.154+: 4668: info : dnsmasqCapsSetFromBuffer:667 : dnsmasq version is 2.48, --bind-dynamic is NOT present, SO_BINDTODEVICE is NOT in use 2014-10-14 16:47:11.157+: 4668: info : networkReloadIptablesRules:1925 : Reloading iptables rules 2014-10-14 16:47:11.157+: 4668: info : networkRefreshDaemons:1287 : Refreshing network daemons 2014-10-14 16:47:11.278+: 4668: info : networkStartNetwork:2422 : Starting up network 'default' 2014-10-14 16:47:11.290+: 4668: info : virStorageBackendVolOpenCheckMode:1085 : Skipping special dir '.' 2014-10-14 16:47:11.290+: 4668: info : virStorageBackendVolOpenCheckMode:1085 : Skipping special dir '..' 2014-10-14 16:47:11.352+: 4668: info : qemudStartup:754 : Unable to create cgroup for driver: No such device or address 2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 : Configured cgroup controller 'cpu' 2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 : Configured cgroup controller 'cpuacct' 2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 : Configured cgroup controller 'cpuset' 2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 : Configured cgroup controller 'memory' 2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 : Configured cgroup controller 'devices' 2014-10-14 16:47:11.353+: 4668: info : qemudLoadDriverConfig:411 : Configured cgroup controller 'blkio' 2014-10-14 16:47:11.509+: 4668: info : virDomainLoadAllConfigs:14696 : Scanning for configs in /var/run/libvirt/qemu 2014-10-14 16:47:11.527+: 4668: info : virDomainLoadAllConfigs:14696 : Scanning for configs in /etc/libvirt/qemu 2014-10-14 16:47:11.527+: 4668: info : virDomainLoadAllConfigs:14718 : Loading config file 'mythbuntu.xml' 2014-10-14 16:47:11.529+: 4668: info : qemuDomainSnapshotLoad:484 : Scanning for snapshots for domain mythbuntu in /var/lib/libvirt/qemu/snapshot/mythbuntu ___ CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
Re: [CentOS-virt] virsh list hangs / guests not starting automatically
I just wanted to follow-up to add that eventually, the virtual machine did start, and now virsh list works as expected. But it took nearly 30 minutes. The updated libvirt.log is shown below. Notice the huge jump in time, from 16:47 to 17:14. (Side question: it appears the timestamps are UTC, rather than my local time, any way to address that?) 2014-10-14 16:47:11.527+: 4668: info : virDomainLoadAllConfigs:14718 : Loading config file 'mythbuntu.xml' 2014-10-14 16:47:11.529+: 4668: info : qemuDomainSnapshotLoad:484 : Scanning for snapshots for domain mythbuntu in /var/lib/libvirt/qemu/snapshot/mythbuntu 2014-10-14 17:14:41.751+: 4668: info : virNetDevProbeVnetHdr:94 : Enabling IFF_VNET_HDR 2014-10-14 17:14:41.805+: 4668: info : virSecurityDACSetOwnership:296 : Setting DAC user and group on '/home/kvm/mythbuntu.img' to '107:107' 2014-10-14 17:14:41.806+: 4668: info : virSecurityDACSetOwnership:296 : Setting DAC user and group on '/mnt/mythtv1/mythbackend_recordings' to '107:107' 2014-10-14 17:14:42.084+: 4668: info : lxcSecurityInit:1380 : lxcSecurityInit (null) 2014-10-14 17:14:42.084+: 4668: info : virDomainLoadAllConfigs:14696 : Scanning for configs in /var/run/libvirt/lxc 2014-10-14 17:14:42.084+: 4668: info : virDomainLoadAllConfigs:14696 : Scanning for configs in /etc/libvirt/lxc 2014-10-14 17:14:42.089+: 4659: error : virFileReadAll:462 : Failed to open file '/proc/4836/stat': No such file or directory 2014-10-14 17:14:42.090+: 4660: error : virFileReadAll:462 : Failed to open file '/proc/8017/stat': No such file or directory 2014-10-14 17:26:34.679+: 4661: info : remoteDispatchAuthList:2398 : Bypass polkit auth for privileged client pid:11343,uid:0 On Tue, Oct 14, 2014 at 12:08 PM, Matt Garman matthew.gar...@gmail.com wrote: I followed the wiki[1] to create a KVM virtual machine using bridged network on CentOS 6.5. It seemed to work fine on initial setup. (FWIW I'm trying to run a MythBuntu guest.) However, after a reboot, it doesn't auto-start the VMs. Shortly after boot, if I go into virsh, then do a list, it just hangs. Likewise, if I go into virt manager, it just hangs with the message connecting. Kernel version is: 2.6.32-431.29.2.el6.x86_64 Relevant package versions: libvirt.x86_64 0.10.2-29.el6_5.12 libvirt-client.x86_640.10.2-29.el6_5.12 libvirt-python.x86_640.10.2-29.el6_5.12 python-virtinst.noarch 0.600.0-18.el6 virt-manager.x86_64 0.9.0-19.el6 virt-top.x86_64 1.0.4-3.15.el6 virt-viewer.x86_64 0.5.6-8.el6_5.3 qemu-img.x86_64 2:0.12.1.2-2.415.el6_5.14 qemu-kvm.x86_64 2:0.12.1.2-2.415.el6_5.14 CPU is a Xeon E3-1230v3. I have the virtualization setting enabled in the BIOS. I googled on this, and saw a bunch of talk about two years ago regarding issues with the libvirt packages having a deadlock bug. But I think the versions of the relevant packages that I have installed are new enough to have fixes for that. I also happened across an earlier post to this list[2], where it seemed someone was having a similar problem. I was previously attempting to use balance-rr and 802.3ad bonding modes on my host. However, I just changed to using active-backup and the problem remains. I have in /etc/libvirt/libvirtd.conf the following three lines (the rest is stock, i.e. all comments): log_level = 2 log_filters= log_outputs=1:file:/var/log/libvirt/libvirt.log Below I posted the contents of the libvirt log file after doing a service libvirt start. Anyone ever fought this before? Thanks! [1] http://wiki.centos.org/HowTos/KVM [2] http://lists.centos.org/pipermail/centos-virt/2014-March/003722.html /var/log/libvirt/libvirt.log output: 2014-10-14 16:47:11.150+: 4657: info : libvirt version: 0.10.2, package: 29.el6_5.12 (CentOS BuildSystem http://bugs.centos.org, 2014-09-01-13:44:02, c6b8.bsys.dev.centos.org) 2014-10-14 16:47:11.150+: 4657: info : virNetlinkEventServiceStart:517 : starting netlink event service with protocol 0 2014-10-14 16:47:11.151+: 4657: info : virNetlinkEventServiceStart:517 : starting netlink event service with protocol 15 2014-10-14 16:47:11.154+: 4668: info : dnsmasqCapsSetFromBuffer:667 : dnsmasq version is 2.48, --bind-dynamic is NOT present, SO_BINDTODEVICE is NOT in use 2014-10-14 16:47:11.157+: 4668: info : networkReloadIptablesRules:1925 : Reloading iptables rules 2014-10-14 16:47:11.157+: 4668: info : networkRefreshDaemons:1287 : Refreshing network daemons 2014-10-14 16:47:11.278+: 4668: info : networkStartNetwork:2422 : Starting up network 'default' 2014-10-14 16:47:11.290+: 4668: info : virStorageBackendVolOpenCheckMode:1085 : Skipping special dir '.' 2014-10-14 16:47:11.290+: 4668: info : virStorageBackendVolOpenCheckMode:1085 : Skipping special dir
Re: [CentOS] centos 6.5 input lag
Update on this problem: From another system, I initiated a constant ping on my laggy server. I noticed that every 10--20 seconds, one or more ICMP packets would drop. These drops were consistent with the input lag I was experiencing. I did a web search for linux periodically hangs and found this Serverfault post that had a lot in common with my symptoms: http://serverfault.com/questions/371666/linux-bonded-interfaces-hanging-periodically I in fact have bonded interfaces on the laggy server. When I checked the bonding config, I realized a while ago I had changed from balance-rr / mode 0, to 802.3ad / mode 4. (I did this because I kept getting bond0: received packet with own address as source address when using balance-rr with a bridge interface. The bridge interface was for using KVM.) For now, I simply disabled one of the slave interfaces, and the lag / dropped ICMP packets problem has gone away. Like the Serverfault poster, I have an HP TrueCurve 1800-24g switch. The switch is supposed to support 802.3ad link aggregation. It's not a managed switch, so I (perhaps incorrectly) assumed that 802.3ad would magically just work. Either there is more required to make it work, or it's implementation is broken. Curiously, however, running my bond0 in 802.3ad mode did work without any issue for over a month. Anyway, hopefully this might help someone else struggling with a similar problem. On Fri, Oct 10, 2014 at 4:17 PM, Matt Garman matthew.gar...@gmail.com wrote: On Fri, Oct 10, 2014 at 4:11 PM, Joseph L. Brunner j...@affirmedsystems.com wrote: If this is a server - is it possible your raid card battery died? It is a server, but a home file server. The raid card has no battery backup, and in fact has been flashed to pure HBA mode. Actual RAID'ing is done at the software level. The only other thing on the hardware side that comes to mind is actual bad sectors if this is not a raided virtual drive. The system has eight total drives: two SSDs in raid-1 for the OS, five 3.5 spinning drives in RAID-6, and a single 3.5 drive normally used for mythtv recordings (though mythtv has been stopped for a long time now to try to debug the issue). From the OS side can you keep the box up long enough to do a yum update? Yes, I updated everything except packages beginning with l (el / lowercase 'L') due to that generating a number of conflicts that I haven't have time to resolve. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] centos 6.5 input lag
On Thu, Oct 9, 2014 at 11:20 PM, Joseph L. Brunner j...@affirmedsystems.com wrote: Is it under some type of ddos attack? What's running on this machine? In front of it? A DDOS attack seems unlikely, though I suppose it's possible. Sitting between the lagging machine and the Internet is a pfSense box. All the other machines in the house have no issues, and they all route through the pfSense system. Right now, the only stuff running on it: - CrashPlan (java backup application) - Munin - Apache (only for Munin, no external access [i.e. no port forwarding from pfSense]) - mpd (music player daemon) Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] centos 6.5 input lag
On Fri, Oct 10, 2014 at 4:11 PM, Joseph L. Brunner j...@affirmedsystems.com wrote: If this is a server - is it possible your raid card battery died? It is a server, but a home file server. The raid card has no battery backup, and in fact has been flashed to pure HBA mode. Actual RAID'ing is done at the software level. The only other thing on the hardware side that comes to mind is actual bad sectors if this is not a raided virtual drive. The system has eight total drives: two SSDs in raid-1 for the OS, five 3.5 spinning drives in RAID-6, and a single 3.5 drive normally used for mythtv recordings (though mythtv has been stopped for a long time now to try to debug the issue). From the OS side can you keep the box up long enough to do a yum update? Yes, I updated everything except packages beginning with l (el / lowercase 'L') due to that generating a number of conflicts that I haven't have time to resolve. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] centos 6.5 input lag
I have a CentOS 6.5 x86_64 system that's been running problem-free for quite a while. Recently, it's locked-up hard several times. It's a headless server, but I do have IP KVM. However, when it's locked up, all I can see are a few lines of kernel stack trace. No hints to the problem in the system logs. I even enabled remote logging of syslog, hoping to catch the errors that way. No luck. I ran memtest86+ for about 36 errors, no problems. I've tried to strip away just about all running services. It's just a home file server. I haven't had a crash in a while, but I also haven't had it running very long. But even while it's up, I have severe input lag in the shell. I'll type a few characters, and two to 10 or so seconds pass before anything echoes to the screen. I've checked top, practically zero CPU load. It's not swapping - 16 GB of RAM, 0 swap used. Most memory heavy process is java (for CrashPlan backups). iostat shows 0% disk utilization. Anyone seen anything like this? Where else can I check to try to determine the source of this lag (which I suspect might be related to the recent crashes)? Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] virsh list hangs
I followed the wiki[1] to create a KVM virtual machine using bridged network on CentOS 6.5. It seemed to work fine on initial setup. However, after a boot, it doesn't auto-start the VMs, or at least, something has to timeout (a *very* long time, on the order of 15--30 minutes) before they can be started. Shortly after boot, if I go into virsh, then do a list, it just hangs. Likewise, if I go into virt-manager, it just hangs connecting. Kernel version is: 2.6.32-431.29.2.el6.x86_64 Relevant package versions: libvirt.x86_64 0.10.2-29.el6_5.12 libvirt-client.x86_640.10.2-29.el6_5.12 libvirt-python.x86_640.10.2-29.el6_5.12 python-virtinst.noarch 0.600.0-18.el6 virt-manager.x86_64 0.9.0-19.el6 virt-top.x86_64 1.0.4-3.15.el6 virt-viewer.x86_64 0.5.6-8.el6_5.3 qemu-img.x86_64 2:0.12.1.2-2.415.el6_5.14 qemu-kvm.x86_64 2:0.12.1.2-2.415.el6_5.14 CPU is a Xeon E3-1230v3. I have the virtualization setting enabled in the BIOS. I googled on this, and saw a bunch of talk about two years ago regarding issues with the libvirt packages having a deadlock bug. But I think the versions of the relevant packages that I have installed are new enough to have fixes for that. Anyone ever fought this before? Thanks! [1] http://wiki.centos.org/HowTos/KVM ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] lost packets - Bond
On Wed, Sep 17, 2014 at 11:28 AM, Eduardo Augusto Pinto edua...@eapx.com.br wrote: I'm using in my bond interfaces as active backup, in theory, should assume an interface (or work) only when another interface is down. But I'm just lost packets on the interface that is not being used and is generating packet loss on bond. My suspicion is that the bonding may be irrelevant here. You can drop packets with our without bonding. There are many reasons why packets can be dropped, but one common one is a too-slow consumer of those packets. For example, say you are trying to watch a streaming ultra-high-definition video on a system with low memory and a slow CPU: the kernel can only buffer so many packets before it has to start dropping them. It's hard to suggest a solution without knowing the exact cause. But one thing to try (as much for debugging as an actual solution) is to increase your buffer sizes. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] grubby fatal error: unable to find a suitable template
I did a bulk yum update -y of several servers. As a sanity check after the upgrade, I ran a grep of /etc/grub.conf across all updated servers looking to ensure the kernel I expected was installed. Two servers came up saying /etc/grub.conf did not exist! I logged into the servers, and /etc/grub.conf was a broken link. (It points to /boot/grub/grub.conf). My systems are all setup with a dedicated /boot partition. Sure enough, /boot was not mounted. Furthermore, I saw no /boot entry in /etc/fstab (which all my other servers contain). So I mounted /boot, and the the grub.conf file was not consistent: it did not have a stanza for the kernel I wanted installed. So I did a yum remove kernel ; yum install -y kernel. Both the remove and the install resulted in this message getting printed: grubby fatal error: unable to find a suitable template Just for kicks, I renamed both the /etc/grub.conf symlink as well as the actual /boot/grub/grub.conf file, and repeated the kernel remove/install. This did NOT produce the above error; however, no symlink or actual grub.conf file was created. I did a little web searching on the above error, and one common cause is that there is no valid title... stanza in the grub.conf file for grubby to use as a template. But my file does in fact contain a valid stanza. I even copied a valid grub.conf file from another server, and re-ran the kernel remove/install: same error. Clearly, something is broken, but I'm not sure what. Anyone seen anything like this? By the way, these machines were all 5.something, being upgraded to 5.7. Thanks! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] grubby fatal error: unable to find a suitable template
On Wed, Jun 25, 2014 at 4:47 PM, m.r...@5-cent.us wrote: ? Why not to 5.10, the current release of CentOS 5.x? Off topic for the question, but, briefly, changing *anything* in our environment involves extensive testing and validation due to very precise performance requirements (HFT, microsecond changes make or break us). For our particular application, we've seen significant performance changes with minor kernel revisions. We've been putting this testing and validation effort into CentOS 6.5, and will hopefully be moving to off 5.x completely before too long. But in the short-term, 5.7 it is for us. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] copying user accounts...
I've used usermod -p encrypted password username successfully many times. Just be careful with escaping of the '$' field separators that appear in the encrypted password string from /etc/shadow. On Tue, Jun 10, 2014 at 4:28 PM, John R Pierce pie...@hogranch.com wrote: I want to copy a few user accounts to a new system... is there a more elegant way to copy /etc/shadow passwords other than editing the file? for instance, is there some way I can give the password hash to /usr/bin/passwd ? -- john r pierce 37N 122W somewhere on the middle of the left coast ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Mother board recommendation
On Fri, May 16, 2014 at 7:21 AM, Joseph Hesse joehe...@gmail.com wrote: I want to build a lightweight server and install centos. Does anyone have a recommendation for a suitable motherboard? What will the role of the server be? How lightweight? How many users, what kinds of services, what (if any) performance requirements, etc? Room for future growth/expansion? Budget? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] High load average, low CPU utilization
On Fri, Mar 28, 2014 at 9:01 AM, Mr Queue li...@mrqueue.com wrote: On Thu, 27 Mar 2014 17:20:22 -0500 Matt Garman matthew.gar...@gmail.com wrote: Anyone seen anything like this? Any thoughts or ideas? Post some data.. This public facing? Are you getting sprayed down by packets? Array? Soft/hard? Someone have screens laying around? Write a trap to catch a process list when the loads spike? Look at crontab(s)? User accounts? Malicious shells? Any guest containers around? Possibilities are sort of endless here. Not public facing (no Internet access at all). Linux software RAID-1. No screen or tmux data. No guest access of any kind. In fact, only three logged in users. I've reviewed crontabs (there are only a couple), and I don't see anything out of the ordinary. Malicious shells or programs: possibly, but I think that is highly unlikely... if someone were going to do something malicious, *this* particular server is not the one to target. What kind of data would help? I have sar running at a five second interval. I also did a 24-hour run of dstat at a one second interval collecting all information it could. I have tons of data, but not sure how to distill it down to a mailing-list friendly format. But a colleague and I reviewed the data, and don't see any correlation with other system data before, during, or after these load spike events. I did a little research on the loadavg number, and my understanding is that it's simply a function of the number of tasks on the system. (There's some fancy stuff thrown in for exponential decay and curve smoothing and all that, but it's still based on the number of system tasks.) I did a simple run of top -b top_output.txt for a 24-hour period, which captured another one of these events. I haven't had a chance to study it in detail, but I expected the number of tasks to shoot up dramatically around the time of these load spikes. The number of tasks remained fairly constant: about 200 +/- 5. How can the loadavg shoot up (from ~1 to ~20) without a corresponding uptick in number of tasks? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] High load average, low CPU utilization
On Fri, Mar 28, 2014 at 10:30 AM, John Doe jd...@yahoo.com wrote: Any USB device? Each time I access USB disks, load goes through the roof. Nope, it's a rack server in a secure remote location, with no peripherals at all attached. Only attached cables are power and network. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] High load average, low CPU utilization
On Fri, Mar 28, 2014 at 9:37 AM, John R. Dennison j...@gerdesas.com wrote: On Fri, Mar 28, 2014 at 09:30:17AM -0500, Matt Garman wrote: How can the loadavg shoot up (from ~1 to ~20) without a corresponding uptick in number of tasks? loadavg is based on number of processes vying for cpu time on the runq; the number of over-all processes on the system is not really relevant unless they are all competing for cpu. Is there a way to see this number of processes in the runq? From the shell or programmatically? What's the i/o wait on the box when you see load spikes? If the box is i/o bound (indicated by high i/o) the load average will spike due to processes blocked on i/o cycles. I ran top -b directed to a file and captured one of these spikes. Here's a sample from the approximate start, peak, and end of the load spike (respectively): top - 18:40:29 up 14 days, 1:34, 4 users, load average: 0.80, 0.48, 0.29 Tasks: 205 total, 1 running, 204 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 4.9%sy, 0.0%ni, 92.1%id, 0.0%wa, 0.1%hi, 1.7%si, 0.0%st top - 19:16:00 up 14 days, 2:09, 4 users, load average: 19.67, 19.02, 15.75 Tasks: 203 total, 1 running, 202 sleeping, 0 stopped, 0 zombie Cpu(s): 1.1%us, 4.6%sy, 0.0%ni, 92.3%id, 0.0%wa, 0.2%hi, 1.9%si, 0.0%st top - 20:20:27 up 14 days, 3:14, 4 users, load average: 0.93, 3.58, 8.69 Tasks: 212 total, 1 running, 211 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 4.8%sy, 0.0%ni, 91.7%id, 0.6%wa, 0.1%hi, 1.6%si, 0.0%st Looks like I collected 17277 total top samples. The max %wa over this time was 61.1%, and less than 40 of those samples had %wa over 10.0. In other words, over many hours, the system had IOwait over 10% for less than a minute. And note that my load spike lasts for almost two hours. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] High load average, low CPU utilization
I have a dual Xeon 5130 (four total CPUs) server running CentOS 5.7. Approximately every 17 hours, the load on this server slowly creeps up until it hits 20, then slowly goes back down. The most recent example started around 2:00am this morning. Outside of these weird times, the load never exceeds 2.0 (and in fact spends the overwhelming majority of its time at 1.0). So this morning, a few data points: - 2:06 to 2:07 load increased from 1.0 to 2.0 - At 2:09 it hit 4.0 - At 2:10 it hit 5.34 - At 2:16 it hit 10.02 - At 2:17 it hit 11.0 - At 2:24 it hit 17.0 - At 2:27 it hit 19.0 and stayed here +/1 1.0 until - At 2:48 it was 18.96 and looks like it started to go down (very slowly) - At 2:57 it was 17.84 - At 3:05 it was 16.76 - At 3:16 it was 15.03 - At 3:27 it was 9.3 - At 3:39 it was 4.08 - At 3:44 it was 1.92, and stayed under 2.0 from there on This is the 1m load average by the way (i.e. first number in /proc/loadavg, given by top, uptime, etc). Running top while this occurs shows very little CPU usage. It seems the standard cause of this is processes in a d state, which means waiting on I/O. But we're not seeing this. In fact, I the system runs sar, and I've collected copious amounts of data. But I don't see anything that jumps out that correlates with these events. I.e., no surges in disk IO, disk read/write bytes, network traffic, etc. The system *never* uses any swap. I also used dstat to collect all data that it can for 24 hours (so it captured one of these events). I used 1 second samples, loaded the info up into a huge spreadsheet, but again, didn't see any obvious trigger or interesting stuff going on while the load spiked. All the programs running on the system seem to work fine while this is happening... but it triggers all kinds of monitoring alerts which is annoying. We've been collecting data too, and as I said above, seems to happen every 17 hours. I checked all our cron jobs, and nothing jumped out as an obvious culprit. Anyone seen anything like this? Any thoughts or ideas? Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] zoneminder
On Thu, Feb 6, 2014 at 9:33 AM, m.r...@5-cent.us wrote: One more thing about zoneminder: after installing it on an FC19 system, I don't see anything that I could immediately identify as a driver. *HOW* does it get the video? In motion, the very lightweight package, it's using V4L2, and the drivers, gspca*, are part of the kernel these days. If ZoneMindar is using the same drivers, then I'd expect that it would occasionally, after an update, wind up with the same problems motion does. That's why I suggested IP cameras earlier, as there is no driver. Or, figuratively speaking, the IP stack is the driver. Anything can break after an update, but basic networking functionality is one of those things I don't expect to break. Also, why are you doing updates anyway? If you had an appliance, as you wanted, would you be doing updates on that? Probably not, if it's working. So why worry about updates? Put ZM on a dedicated server or VM, get it working the way you want, then leave it alone. Weld the case shut and disable remote logins and now you literally have an appliance. Btw, I'm now also looking at lower-end video capture cards, like the Hauppage Impactvdb, model 188 (four bnc inputs). For that, what I haven't found out yet, is whether it provides the cameras one at a time, to be switched among, or if all four can stream at the same time, which is what we *must* have. My personal experience with lower-end hardware is that it's the stuff most likely to break during updates. It's cheap so the release process is sloppy and documentation lagging/poor/inaccurate/non-existent, so you end up with situations where the drivers are chasing infinite subtle revisions, and/or reverse engineered, and/or some other kind of kludgery. If you pay a premium, you can buy stuff that has official Linux support from the manufacturer. I was looking at Sensoray products for my home, but they are out of my price range. Probably beyond your budget as well (based on what you've suggested), but it appears that Linux is an explicit target for their products, not an afterthought or the dreaded unsupported/use-at-your-own-risk. But again, IP cameras remove all this complexity ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Motion Detecting Camera
On Mon, Feb 3, 2014 at 2:15 PM, m.r...@5-cent.us wrote: 2) My manager says he wants to be out of the business of this, and wants me to look into surveillance appliance packages - that is, a DVR w/ say, four cameras. They're all in Does this mean ZoneMinder is out of the question, since it's not an appliance? I mean, just for the sake of argument, what happens if you buy IP cameras and use ZoneMinder? Isn't that the beauty of an IP camera, you don't need fancy drivers or have to worry about upgrade breakage? (Unless of course your IP stack breaks, but then you probably have much bigger problems.) IP cameras allow you to (1) decouple the camera problem from the DVR problem, and (2) avoid wacky USB/analog capture driver issues. I don't know if there's anyone selling OTS ZoneMinder appliances, but it's conceivably possible. And if so, it would be like the Untangle filtering package, where the line between OTS appliance and DIY is blurred. (E.g., with Untangle, you can buy a filtering appliance from them, or you can run their software on your own server.) I guess I fail to see how the previous poster's suggestion (which is basically the same as what I initially posted last week) fails to meet your requirements: 1. Replace cheapo USB cameras with respectable IP cameras. 2. Assign IPs to all cameras. 3. Set up ACLs and/or partition your network to meet security requirements. 4. Designate a single server (physical or VM) to act as your DVR appliance. In this case, it's a Linux server running ZoneMinder. 5. Configure ZoneMinder to do full-time/always on recording, and setup whatever maintenance and management scripts to need to shuffle around/delete/archive the video. Once this is in place, I don't see how the end result is any different than buying a surveillance appliance. Even an OTS package will require some amount of initial setup. But *either way*, once the system is in place and working, it should just work and not require any further hand-holding. Treat the ZoneMinder box as an appliance - that is, if it's working, don't touch it. Don't upgrade ZM or the underlying OS. Just leave it alone and let it work. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] OT hardware question
On Fri, Jan 31, 2014 at 9:52 AM, m.r...@5-cent.us wrote: With the continuing annoyance from motion, my manager's asked me to go looking again for a video surveillance appliance: basically, a motion-detecting DVR and cameras. The big thing, of course, is a) price (this is a US federal gov't agency, and being civilian, money is *tight*, don't give me the libertarian/GOP line about how freely we spend, thankyouverymuch), b) it has to be on the network, and c) we need to be able to d/l to a server, and rm after we do that... and we want to script or cron job that. Right now, I'm looking into Zmodo, R-Tech and CIB security. Anyone have a) opinions on the quality of the hardware from any of those manufacturers (yeah, I know, they're just branded hardware), and/or whether we can do the ssh or telnet in to do what we need? *Extremely* frustrating, since they're all running embedded Linux, that so many say IE and Active X I don't have specific recommendations for you, but here is some general info that you might find useful, as I've been looking into this myself. Obviously, there exist IP cameras, but, as you've noticed, you have to be careful that it supports an open standard, and not IE/ActiveX exclusively. Another approach is to just get an analog camera, along with a capture device. The capture devices come in USB or PCI(e) flavors (possibly more), and range in price from super cheap (10 USD) to crazy expensive. Just from reading about this stuff, it appears there's a tradeoff, the cheap hardware may require some wrestling to work reliably, and then may randomly die at some point. With a little research you can probably find a good balance. I've done a little searching on eBay, and it looks like there is no shortage of capture devices to be had there for cheap as well, if buying used is an option. As always, it depends on your application, but with an analog camera, you move the smarts to your PC or server. Consider if you have many cameras, do you want to have that many more servers to manage, or would you rather have one server with many purpose-built devices attached? Take a look at the ZoneMinder software package. It's the free/open source way to build a surveillance appliance. Again, I haven't used it. I currently have a Speco D4RS device (came with the house I just moved into), which is an off-the-shelf surveillance appliance. The viewer is IE-only, and the standalone apps are Windows-only (though they do have Android apps, so some quasi-Linux support)... it's half-way decent, although I've only just started playing with it in earnest. But I'm looking to ZoneMinder as a possible replacement, partially to get onto an open platform, but also to hopefully consolidate a standalone device into my existing home server. As for the cameras themselves, I don't know what model I have, and wasn't supplied documentation. My dad's been interested in getting some camera surveillance going at his house. But we both get discouraged when looking for cameras because there seems to be a million makes and models, but most are probably just re-branded OEM versions. The specs always seems to be unclear or inconsistent, and except for the crazy-expensive ones, they always seem to have lousy user reviews. So we always get discouraged trying to wade through the mess and give up. -Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] OT hardware question
On Fri, Jan 31, 2014 at 11:00 AM, m.r...@5-cent.us wrote: I think you misunderstood me. I'm not looking for IP cameras - we'll be getting cameras that plug into a surveillance DVR appliance. It's the -DVR's- firmware software will do the recording and picture taking. What we need is to be able to d/l *from* the DVR to a server, where we can store it. We don't even need fancy cameras - we're currently using 10 or so yr old, USB 1 or 1.1 webcams, and the standard package named motion, which works fine... when it works. When the drivers have bugs creep in, that's what's pushed my manager to ask me to look this up. Your original email said, my manager's asked me to go looking again for a video surveillance appliance: basically, a motion-detecting DVR and cameras. I interpreted that as you need a full on DVR suite, from the camera(s) to the DVR, to the management interface. Re-reading your original email, I don't think my interpretation was wrong. Now it sounds like the camera part of your solution is already in place: you're using USB webcams, right? And that part of the solution will remain unchanged? That said, I believe my previous email still has some useful information for you. As I said, I have a Speco D4RS: this is an off-the-shelf DVR appliance (that actually advertises the fact that it runs Linux under-the-hood). In the general sense, it supports the features you need: download videos to your server (or cloud or whatever), and manage the videos on the DVR itself. But it falls short for you in that it expects analog camera inputs, and is mostly Windows-centric. But given that ZoneMinder is attempting to compete with these types of DVR appliances, I would be surprised if it didn't support all the same features as my Speco, but on an open platform, and with many more input options (USB, IP, DVB/V4L capture card, etc). In other words, assuming ZoneMinder supports the features you need, another option for you is to roll your own DVR appliance with commodity PC hardware. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] OT hardware question
On Fri, Jan 31, 2014 at 12:45 PM, m.r...@5-cent.us wrote: Mostly likely it *will*. I think we expect to get a surveillance appliance - a DVR with firmware, and cameras as part of the package. The ancient USB cheapie webcams will go. I see, so you're looking at a complete package that includes everything. I was coming at this from an a la carte perspective, where you would buy the components individually, and piece your system together from there. That's what my original email was based on anyway. If you do indeed go a la carte, then the number of options goes up, e.g. analog vs IP cameras, OTS DVR vs open source, etc etc. But yes, an all-in-one package makes many of those decisions for you. Most of them do. On the other hand, where do you see it using that? I've got to Speco's unpleasant website, and managed to find a spec sheet, which does *not* mention what the firmware is, while Zmodo and R-Tech and others do. Check out this link[1], Reliable Linux Operating System is listed as the first Product Highlight. I do like the gigabit NIC on it. I don't see it as a package, with four videocams, and the cheapest I see it is $322+; newegg wants over $500 for it, and that's pushing the envelope, esp. when we need several, and there's no cameras in the package. I may just call them to get details - I need to do that for several other OEMs. How many cameras do you need in total? And what is the budget? Btw, also, motion is not, AFAIK, zoneminder. Right, I don't know anything about Motion, except that I've seen it mentioned in context with ZoneMinder. My limited knowledge is that they are competing Linux/open source DVR packages, although Motion is the older one with fewer features. Why not use your existing USB webcams and just give ZoneMinder a try? Especially if you have an old unused PC or laptop that you can test with. Nothing to lose but time. For the sake of argument, say ZoneMinder supports all the features you need. It should run on low end PC hardware. If you don't already have some old PC/server hardware you can re-purpose, you can assemble one for peanuts. Or, throw another hard drive on an existing server and run ZoneMinder on that. Resource utilization should be negligible (unless you're capturing multiple high-def streams, which you clearly don't have the budget for). My point is, assuming ZoneMinder meets all your DVR feature/software/management requirements, you can dedicate your entire budget to the cameras (and possibly capture device if you don't buy IP cameras). In other words, why pay for a DVR appliance, when you can have all that functionality for free with ZoneMinder? If your budget is that limited, and you need to spread it out across the surveillance package (i.e. cameras + DVR), you're going to end up with mediocre hardware. Cut out the cost of the DVR part and use the budget to get better cameras. [1] http://www.bhphotovideo.com/c/product/875452-REG/Speco_Technologies_d4rs500_D4RS_4_Channel_DVR_500.html ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] OT hardware question
On Fri, Jan 31, 2014 at 12:50 PM, m.r...@5-cent.us wrote: ...but my manage says he'd like to get out of the business of making video surveillance work, when there's off-the-shelf stuff out there. Sounds like a classic problem where you have three requirements... 1. Just works / off-the-shelf, no management required 2. Quality reliabilty 3. Low cost ...but can only choose two. :) Just from reading user reviews of the low cost stuff, it sounds like you sacrifice quality reliability for ease-of-use convenience. But even those features are suspect, if the reviews are to be believed. Just my $0.02 :) ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NIS or not?
On Tue, Jan 28, 2014 at 3:02 AM, Sorin Srbu sorin.s...@orgfarm.uu.se wrote: The only thing I'm trying to accomplish is a system which will allow me to keep user accounts and passwords in one place, with one place only to administrate. NIS seems to be able to do that. Comments and insights are much appreciated! A related question: is NIS or LDAP (or something else entirely) better if the machines are not uniform in their login configuration? That is, we have an ever-growing list of special cases. UserA can login to servers 1, 2 and 3. UserB can log in to servers 3, 4, and 5. Nobody except UserC can login to server 6. UserD can login to machines 2--6. And so on and so forth. I currently have a custom script with a substantial configuration file for checking that the actual machines are configured as per our intent. It would be nice if there was a single tool where the configuration and management/auditing could be rolled into one. Thanks! Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NIS or not?
On Tue, Jan 28, 2014 at 9:18 AM, m.r...@5-cent.us wrote: At this late date, I'd be really, *REALLY* leery of using NIS. You say that *most* of your traffic is local, suggesting that some of it is *not*. And, for that matter, how good are the firewalls keeping other traffic out? I'd say no to NIS. Yes, other answers may be more difficult to set up, but consider the alternatives. That is, we have an ever-growing list of special cases. UserA can login to servers 1, 2 and 3. UserB can log in to servers 3, 4, and 5. Nobody except UserC can login to server 6. UserD can login to machines 2--6. And so on and so forth. Here you may not realize you're distinguishing between authentication and authorization. Yeah, I forgot to mention that we already have Kerberos in place for authentication. It's authorization that is currently done by hand and checked with a manual script. (I needed that for the secure mount options NFSv4 provides.) I sincerely hope it's easier to set up and administer and upgrade than native LDAP. In '06, after a discussion with the other admin and manager I was working with at that job, I volunteered to set up openLDAP. Let's just say that the tools were NOT vaguely ready for prime time, though I did find that running webmin helped a *lot* to get it working. I know you can find a horror story for any piece of software on the Internet, but my impression is that LDAP has an unusually high number of scary-sounding anecdotes. I know random Internet blogs forum posts aren't really authoritative, but they do give me a little trepidation regarding LDAP. We have an in-house written set of scripts that administer relevant configuration files, including /etc/passwd. It copies the correct version of that file (among many others) to each host, and shell of /bin/noLogin works just fine. Why set the shell to /bin/noLogin, rather than simply not create that user's /etc/passwd entry? I don't have /bin/noLogin on any of my systems - I assume you deliberately specified a non-existent program for the shell? What's the difference between setting the user's shell to a bogus program versus something like /bin/false? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Do I need a dedicated firewall?
On Wed, Dec 11, 2013 at 11:00 PM, Jason T. Slack-Moehrle slackmoeh...@gmail.com wrote: So my electricity bill is through the roof and I need to pair down some equipment. If you are in the USA, get yourself a Kill-a-Watt power meter. I'm sure other parts of the world have similar products. It's a device that goes between your electrical product (e.g. server) and the wall AC outlet, and tells you what the power draw is. It also keeps a cumulative total for number of Watts and Volt-Amps used in the time period it's plugged in. (If you have a 100% efficient PFC in your power supply, Watts will always equal Volt-Amps. I believe this is mandated in Europe. But a PFC below 1.0 will cause Volt-Amps to be higher than Watts. In the USA you are typically billed by Watts, but if you have a UPS, the Volt-Amp number matters.) The question is, are you sure it's all your computers causing the spike in your power bill? For example, if you have an old refrigerator, those are typically very inefficient and use more power than necessary. The Kill-a-Watt will tell you which devices are most power greedy. I have a CentOS 6.5 Server (a few TB, 32gb RAM) running some simple web stuff and Zimbra. I have 5 static IP's from Comcast. I am considering giving this server a public IP and plugging it directly into my cable modem. This box can handle everything with room for me to do more. Doing this would allow me to power down my pfSense box and additional servers by consolidating onto this single box. What kind of hardware is your pfSense box? I too have a pfSense server, but it's on a fairly low-power Atom board. Pulls less than 20 watts at any given time. The average cost of electricity in the USA is about $0.11/kwh. Using that number, a constant X watt draw conveniently works out to costing $X/year. So my pfSense box costs less than $20/year in electricity. Obviously, if your electricity is much more expensive, it changes the equation. Just food for thought. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] making a script into a service
Turn it into a daemon as described, then take a look at the existing scripts in /etc/init.d/. There might even be a template in there iirc. Your script will likely be a simple wrapper around your daemonized python program. After that, just do a chkconfig --add myscript“ where myscript is the name of your script in /etc/init.d. On Dec 9, 2013 7:05 AM, Larry Martell larry.mart...@gmail.com wrote: On Mon, Dec 9, 2013 at 8:00 AM, Fabrizio Di Carlo dicarlo.fabri...@gmail.com wrote: Try to use this http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/it allows you start/stop/restart the script using the following commands. python myscript.py start python myscript.py stop python myscript.py restart Source: http://stackoverflow.com/questions/16420092/how-to-make-python-script-run-as-service Yes, I've seen that question and site. I want to be able to control it with the service command. The technique on this site makes the script a daemon, but that does not make it controllable with service. On Mon, Dec 9, 2013 at 1:54 PM, Larry Martell larry.mart...@gmail.com wrote: We have a python script that is currently run from cron. We want to make it into a service so it can be controlled with service start/stop/restart. Can anyone point me at site that has instructions on how to do this? I've googled but haven't found anything. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos -- The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift. (A. Einstein) La mente intuitiva è un dono sacro e la mente razionale è un fedele servo. Noi abbiamo creato una società che onora il servo e ha dimenticato il dono. (A. Einstein) Fabrizio Di Carlo ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/cento ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] quota doesn't appear to work - repquota only updates when quotacheck is run
I have set up user quotas on an ext4 filesystem. It does not appear that the quota system is being updated, except when I manually run quotacheck. More detail: I run warnquota -s from a script in /etc/cron.daily. I noticed that no one had received an over quota message in a long time. Using repquota -as, it indeed looked as though everyone was under their quotas. But du showed many people were over quota. So I did a quotaoff -a ; quotacheck -vam ; quotaon -a. That night, several warnquota-generated messages went out. My users diligently cleaned up their homes. Fast forward 24 hours, and the users received the same warnquota emails. repquota showed them as being over, but du told a different story. System is CentOS 6.3, kernel 2.6.32-279.2.1.el6.x86_64. # dmesg | grep -i quota VFS: Disk quotas dquot_6.5.2 The partition is type ext4 mounted at /share: # cat /proc/mounts | grep share /dev/mapper/VolGroup_Share-LogVol_Share /share ext4 rw,noatime,nodiratime,barrier=0,nobh,data=writeback,jqfmt=vfsv0,usrjquota=aquota.user 0 0 The ext4 volume sits on top of an lvm logical partion. That logical volume ultimately sits on top of an encrypted disk using cryptsetup luksFormat: # lvscan ACTIVE'/dev/VolGroup_Share/LogVol_Share' [4.48 TiB] inherit # pvscan PV /dev/mapper/luks-7f865362-ee9f-40de-bc07-73701b4662f3 VG VolGroup_Share lvm2 [4.48 TiB / 0free] Is there something in my ext4 mount options that is incompatible with quota? Or maybe the encrypted layer is causing problems? Am I missing something else? Thanks! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] clock sync/drift
Hi, We have a little over 100 servers, almost all running CentOS 5.7. Virtually all are Dell servers, generally a mix of 1950s, R610s, and R410s. We use NTP and/or PTP to sync their clocks. One phenomenon we've noticed is that (1) on reboot, the clocks are all greatly out of sync, and (2) if the PTP or NTP process is stopped, the clocks start drifting very quickly. If it was isolated to one or two servers, I'd dismiss the issue. I also had this problem under CentOS 4. I suspect something is mis-configured, because I can't imagine the hardware clock on ALL these servers is *that* bad. Anyone else dealt with anything similar? Thanks! Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Off-Topic: Low Power Hardware
On Fri, Jan 11, 2013 at 8:55 AM, SilverTip257 silvertip...@gmail.com wrote: I'm in search of some hardware that consumes a low amount of power for use as a test-bed for Linux, various coding projects, and LAN services. 1) Low power consumption (10-15W ... maybe 30W at most) 2) Must run Linux without too much fuss (CentOS or otherwise) 3) Must have two NICs (fast ethernet or better) 4) Memory - 1GB or better 5) Can be configurable either via serial or VGA. 6) Accepts a normal hard drive, not CF -- drive capacity is my concern. 7) spare PCI slot is a _plus_ (extra NICs or whatever else) 8) I'd like to keep the physical footprint to a minimum (size of a 1U switch or so?) The lowest-power x86 device I've used is an Alix 2d2 from PCEngines. Power consumption was about five watts, regardless of load. This has three 100 mbps NICs, a 32-bit x86 AMD Geode CPU, and 256 MB RAM soldered to the board. Has a built-in Compact Flash slot to use as a hard drive. I ran OpenBSD on mine for years as a firewall/gateway/router for a home LAN (don't see why it wouldn't run CentOS). (I'm actually selling mine, email off list if interested.) I upgraded my firewall device to an Atom-based D2500CCE. IIRC, I installed 2x2GB of RAM, booting from a cheap SSD, powered by a PicoPSU, and running PFSense. I think this configuration pulls roughly 16 watts at idle, maybe a couple more watts when fully loaded. This board has dual Intel gigabit ethernet ports. For my home theater PC, I'm running an ASRock H67M-ITX and Core i3-2100 CPU, with 2x4GB of RAM and SSD. I have it inside a Habey EMC-800B case, using the included power supply. Idle power consumption is about 22 watts. It's been a while since I measured power consumption at load, but I'd guess 50--60 watts (it's idle 99% of the time though). Note that even when idle, MythTV seems to use a little CPU, so if I kill mythfrontend, my idle power consumption drops another watt or two. Only one NIC on the Asrock board, but it has a PCIe expansion slot so you could easily add another. I'd expect an add-on NIC to add around one to five watts of power consumption. My personal workstation uses an Intel DH67GD micro-ATX motherboard, i5-2500k CPU, 4x4GB RAM, SSD, and traditional ATX power supply (Seasonic SS-300ET). It pulls about 30 watts when idle. Only one NIC on that motherboard. For all the above, I'm talking AC (i.e. at the wall) power consumption, in the USA (so 115 Volts), measured with a Kill-A-Watt (not high-precision, but should be reasonable within a watt or two). What follows is stuff with which I have no personal experience, but have read about: The Intel S1200KP mini-itx motherboard. It has built-in dual gigabit NICs, socket 1155, so you can use anything from a Celeron up to a Xeon, depending on how much you want to spend and what your upper-bound computational needs are. I was considering that for my firewall/router replacement. With a PicoPSU I would suspect that one could get 20 watts or lower idle power consumption. With an Intel DQ77KB motherboard, and Pentium G2120, SilentPCReview built a system that pulls 16.5 Watts[1]. (The article is a case review, but power consumption information is included.) That DQ77KB board also has dual gigabit NICs. You might also be interested in Intel's NUC - Next Unit of Computing[2]. About 10 watts power consumption for dramatically under-clocked i3 CPU. In general, with modern Sandy/Ivy Bridge CPUs, it's almost trivial to build a high-performing system that has 30 watt or less idle power consumption. If you cherry-pick components, it's not terribly hard to get a system with 20 watt idle power draw. The modern Intel CPUs all have roughly the same idle power usage (at least the consumer line, not sure about Xeons). That goes for the more expensive low-power variants as well. The difference of the low-power variants is their upper-bound power consumption is lower than their peers. But you can often fake that by deliberately limiting the max frequency in the BIOS. Of course, with these real CPUs (compared to e.g. Atom), power consumption will be much higher when loaded. But from what I've read, the real CPUs are actually better in the long run, because their computation efficiency is so much higher. With something like Atom, you get more deterministic power draw, but a severely compromised upper-bound on computational power. In your requirements, you mentioned various coding projects. If you are working in a compiled language (e.g. C, C++, Java), for substantially large projects, your compile times will be painful on Atom, but pleasantly fast on a Sandy/Ivy Bridge CPU. [1] http://www.silentpcreview.com/Akasa_Euler_Fanless_Thin_ITX_Case [2] http://www.silentpcreview.com/Intel_NUC_DC3217BY ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Tue, Dec 11, 2012 at 1:58 AM, Nicolas KOWALSKI nicolas.kowal...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:37:50AM -0600, Matt Garman wrote: OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. Is the data=journal option really wanted here? Did you try with the other journalling modes available? I also think you are missing the noatime option here. Short answer: I don't know. Intuitively, it seems like it's not the right thing. However, there are a number of articles out there[1], that say in data=journal may improve performance dramatically, in cases where there is a both a lot of reading and writing. That's what a home directory server is to me: a lot of reading and writing. However, I haven't seen any tool or mechanism for precisely quantifying when data=journal will improve performance; everyone just says change it and test. Unfortunately, in my situation, I didn't have the luxury of testing, because things were unusable now. [1] for example: http://www.ibm.com/developerworks/linux/library/l-fs8/index.html ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Tue, Dec 11, 2012 at 2:24 PM, Dan Young danielmyo...@gmail.com wrote: Just going to throw this out there. What is RPCNFSDCOUNT in /etc/sysconfig/nfs? It was 64 (upped from the default of... 8 I think). ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Tue, Dec 11, 2012 at 4:01 PM, Steve Thompson s...@vgersoft.com wrote: This is in fact a very interesting question. The default value of RPCNFSDCOUNT (8) is in my opinion way too low for many kinds of NFS servers. My own setup has 7 NFS servers ranging from small ones (7 TB disk served) to larger ones (25 TB served), and there are about 1000 client cores making use of this. After spending some time looking at NFS performance problems, I discovered that the number of nfsd's had to be much higher to prevent stalls. On the largest servers I now use 256-320 nfsd's, and 64 nfsd's on the very smallest ones. Along with suitable adjustment of vm.dirty_ratio and vm.dirty_background_ratio, this makes a huge difference. Could you perhaps elaborate a bit on your scenario? In particular, how much memory and CPU cores do the servers have with the really high NFSD counts? Is there a rule of thumb for nfsd counts relative to the system specs? Or, like so many IO tuning situations, just a matter of test and see? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Wed, Dec 12, 2012 at 12:29 AM, Gordon Messmer yiny...@eburg.com wrote: That may be difficult at this point, because you really want to start by measuring the number of IOPS. That's difficult to do if your applications demand more than your hardware currently provices. Since my original posting, we temporarily moved the data from the centos 5 server to the centos 6 server. We rebuilt the original (slow) server with centos 6, then migrated the data back. So far (fingers crossed) so good. I'm running a constant iostat -kx 30, and logging it to a file. Disk utilization is virtually always under 50%. Random spikes in the 90% range, but they are few and far between. Now that it appears the hardware + software configuration can handle the load. So I still have the same question: how I can accurately *quantify* the kind of IO load these servers have? I.e., how to measure IOPS? This might not be the result of your NFS server performance. You might actually be seeing bad performance in your directory service. What are you using for that service? LDAP? NIS? Are you running nscd or sssd on the clients? Not using a directory service (manually sync'ed passwd files, and kerberos for authentication). Not running nscd or sssd. RAID 6 is good for $/GB, but bad for performance. If you find that your performance is bad, RAID10 will offer you a lot more IOPS. Mixing 15k drives with RAID-6 is probably unusual. Typically 15k drives are used when the system needs maximum IOPS, and RAID-6 is used when storage capacity is more important than performance. It's also unusual to see a RAID-6 array with a hot spare. You already have two disks of parity. At this point, your available storage capacity is only 600GB greater than a RAID-10 configuration, but your performance is MUCH worse. I agree with all that. Problem is, there is a higher risk of storage failure with RAID-10 compared to RAID-6. We do have good, reliable *data* backups, but no real hardware backup. Our current service contract on the hardware is next business day. That's too much down time to tolerate with this particular system. As I typed that, I realized we technically do have a hardware backup---the other server I mentioned. But even the time to restore from backup would make a lot of people extremely unhappy. How do most people handle this kind of scenario, i.e. can't afford to have a hardware failure for any significant length of time? Have a whole redundant system in place? I would have to sell the idea to management, and for that, I'd need to precisely quantify our situation (i.e. my initial question). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. data=journal actually offers better performance than the default in some workloads, but not all. You should try the default and see which is better. With a hardware RAID controller that has battery backed write cache, data=journal should not perform any better than the default, but probably not any worse. Right, that was mentioned in another response. Unfortunately, I don't have the ability to test this. My only system is the real production system. I can't afford the interruption to the users while I fully unmount and mount the partition (can't change data= type with remount). In general, it seems like a lot of IO tuning is change parameter, then test. But (1) what test? It's hard to simulate a very random/unpredictable workload like user home directories, and (2) what to test on when one only has the single production system? I wish there were more analytic tools where you could simply measure a number of attributes, and from there, derive the ideal settings and configuration parameters. If your drives are really 4k sectors, rather than the reported 512B, then they're not optimal and writes will suffer. The best policy is to start your first partition at 1M offset. parted should be aligning things well if it's updated, but if your partition sizes (in sectors) are divisible by 8, you should be in good shape. It appears that centos 6 does the 1M offset by default. Centos 5 definitely doesn't do that. Anyway... as I suggested above, the problem appears to be resolved... But the fix was kind of a shotgun approach, i.e. I changed too many things at once to know exactly what specific item fixed the problem. I'm sure this will inevitably come up again at some point, so I'd still like to learn/understand more to better handle the situation next time. Thanks! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] home directory server performance issues
I’m looking for advice and considerations on how to optimally setup and deploy an NFS-based home directory server. In particular: (1) how to determine hardware requirements, and (2) how to best setup and configure the server. We actually have a system in place, but the performance is pretty bad---the users often experience a fair amount of lag (1--5 seconds) when doing anything on their home directories, including an “ls” or writing a small text file. So now I’m trying to back-up and determine, is it simply a configuration issue, or is the hardware inadequate? Our scenario: we have about 25 users, mostly software developers and analysts. The users login to one or more of about 40 development servers. All users’ home directories live on a single server (no login except root); that server does an NFSv4 export which is mounted by all dev servers. The home directory server hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. I’m fairly certain I did not align the partitions optimally: [root@lnxutil1 ~]# parted -s /dev/sda unit s print Model: DELL PERC H700 (scsi) Disk /dev/sda: 134217599s Sector size (logical/physical): 512B/512B Partition Table: msdos Number StartEnd SizeType File system Flags 1 63s 465884s 465822s primary ext2 boot 2 465885s 134207009s 133741125s primary lvm [root@lnxutil1 ~]# parted -s /dev/sdb unit s print Model: DELL PERC H700 (scsi) Disk /dev/sdb: 5720768639s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s5720768606s 5720768573s lvm Can anyone confirm that the partitions are not aligned correctly, as I suspect? If this is true, is there any way to *quantify* the effects of partition mis-alignment on performance? In other words, what kind of improvement could I expect if I rebuilt this server with the partitions aligned optimally? In general, what is the best way to determine the source of our performance issues? Right now, I’m running “iostat -dkxt 30” re-directed to a file. I intend to let this run for a day or so, and write a script to produce some statistics. Here is one iteration from the iostat process: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.0044.09 0.03 107.76 0.13 607.40 11.27 0.898.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.0044.09 0.03 107.76 0.13 607.40 11.27 0.898.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.258.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 What I observe, is that whenever sdb (home directory partition) becomes loaded, sda (OS) often does as well. Why is this? I would expect sda to generally be idle, or have minimal utilization. According to both “free” and “vmstat”, this server is not swapping at all. At one point, our problems were due to a random user writing a huge file to their home directory. We built a second server specifically for people to use for writing large temporary files. Furthermore, for all the dev servers, I used the following tc commands to rate limit how quickly any one server can write to the home directory server (8 Mbps or 1 MB/s): ETH_IFACE=$( route -n | grep ^0.0.0.0 | awk '{ print $8 }' ) IFACE_RATE=1000mbit LIMIT_RATE=8mbit TARGET_IP=1.2.3.4 # home directory server IP tc qdisc add dev $ETH_IFACE root handle 1: htb default 1 tc class add dev $ETH_IFACE parent 1: classid 1:1 htb rate $IFACE_RATE ceil $IFACE_RATE tc class add dev $ETH_IFACE parent 1: classid 1:2 htb rate $LIMIT_RATE ceil $LIMIT_RATE tc filter add dev $ETH_IFACE parent 1: protocol ip prio 16 u32 match ip dst $TARGET_IP flowid 1:2 The other interesting thing is that the second server I mentioned—the one specifically designed for users to
Re: [CentOS] Static routes with a metric?
Adding additional info for posterity, and in case anyone else runs across this... On Wed, Dec 7, 2011 at 12:28 PM, Benjamin Franz jfr...@freerun.com wrote: On 12/7/2011 10:03 AM, Matt Garman wrote: Hi, [...] What I basically need to be able to do is this: route add -host h1 gw g1 metric 0 route add -host h1 gw g2 metric 10 Notice that everything is the same except the gateway and metric. I could put this in /etc/rc.local, but was wondering if there's a cleaner way to do it in e.g. the network-scripts directory. If you create files in the /etc/sysconfig/network-scripts directory named according to the scheme route-eth0 route-eth1 route-eth2 it will execute each line in the files as /sbin/ip route add line when each interface is brought up. Look in the /etc/sysconfig/network-scripts/ifup-routes script for all the gory details and features. I actually did just that---looked at the ifup-routes script. The thing that threw me off is the comments about older format versus new format. I probably read into the comments too much, but I thought to myself, I should probably use the new format, as they might some day deprecate the old format. But anyway, the older format is what I need. With the older format, it's exactly what you said above: each line corresponds to running ip route add line. So what I added were lines in this format: addr/mask via gateway dev device metric N A contrived example might be: 10.25.77.0/24 via 192.168.1.1 dev eth0 metric 5 The new format is where each group of three lines corresponds to a route. You have the ADDRESSxx=, NETMASKxx=, GATEWAYxx= lines. Clearly this is less flexible, particularly if you need to set a metric like me. :) Anyway, hopefully that's useful for anyone in a similar situation! -Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Static routes with a metric?
Hi, How can I define static routes to be created at boot time with a specific metric? I have two NICs that ultimately end up at the same peer, but literally go through two completely different networks. IOW, each NIC connects to a different layer 3 device. Also, note that the machine actually has three total NICs: the third is the owner of the default route. The two mentioned above are for a specialized sub net. What I basically need to be able to do is this: route add -host h1 gw g1 metric 0 route add -host h1 gw g2 metric 10 Notice that everything is the same except the gateway and metric. I could put this in /etc/rc.local, but was wondering if there's a cleaner way to do it in e.g. the network-scripts directory. Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] ibm m1015 w/sandy bridge boot failure
On Thu, Oct 27, 2011 at 09:18:56AM -0500, Matt Garman wrote: I have a server running CentOS 6.0. Last night I replaced the CPU and motherboard. Old hardware: Supermicro x8sil-f + x3440. New hardware: Supermicro x9scl+-f + E3-1230. This is a new Sandy Bridge Xeon. Everything else remained the same, including an IBM m1015 SAS HBA. This is just an IBM re-branded LSI 92xx-8i (9220-8i specifically I believe), which uses the LSI SAS2008 chipset and the megaraid_sas driver. It worked without any issue on the old motherboard/cpu. However, with the new motherboard/cpu, the system makes it through the BIOS POST without any issue. But about half-way through the kernel initialization, it basically locks up. It will sit there for several minutes, doing nothing, then start printing out error messages (I was unable to get a screenshot or take note of the errors). But if I take the m1015 card out, the system boots quickly and without issue, as it always has. I saw this post[1] on the forums, which suggested that Sandy Bridge really needs 6.1, which, for those of us using CentOS, we can only get sorta close to by using the continuous release. So I did a yum install centos-release-cr ; yum update. I let everything install (there were no install errors or problems), and rebooted with the m1015 in back in the system, but the problem remains. So now I'm at a loss. Anyone have any thoughts? [1] http://www.centos.org/modules/newbb/viewtopic.php?topic_id=33878forum=56 SuperMicro actually has a FAQ[1] on a very similar issue: same motherboard, but an LSI 9240 RAID card (very similar to the IBM M1015, same actual chipset I believe), and CentOS 5.5. But the described problem is the same as mine. Simple fix: upgrade BIOS to v1.1a or later. Works for me! [1] http://www.supermicro.com/support/faqs/faq.cfm?faq=12830 Hope this helps anyone with the same problem. -Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] ibm m1015 w/sandy bridge boot failure
I have a server running CentOS 6.0. Last night I replaced the CPU and motherboard. Old hardware: Supermicro x8sil-f + x3440. New hardware: Supermicro x9scl+-f + E3-1230. This is a new Sandy Bridge Xeon. Everything else remained the same, including an IBM m1015 SAS HBA. This is just an IBM re-branded LSI 92xx-8i (9220-8i specifically I believe), which uses the LSI SAS2008 chipset and the megaraid_sas driver. It worked without any issue on the old motherboard/cpu. However, with the new motherboard/cpu, the system makes it through the BIOS POST without any issue. But about half-way through the kernel initialization, it basically locks up. It will sit there for several minutes, doing nothing, then start printing out error messages (I was unable to get a screenshot or take note of the errors). But if I take the m1015 card out, the system boots quickly and without issue, as it always has. I saw this post[1] on the forums, which suggested that Sandy Bridge really needs 6.1, which, for those of us using CentOS, we can only get sorta close to by using the continuous release. So I did a yum install centos-release-cr ; yum update. I let everything install (there were no install errors or problems), and rebooted with the m1015 in back in the system, but the problem remains. So now I'm at a loss. Anyone have any thoughts? [1] http://www.centos.org/modules/newbb/viewtopic.php?topic_id=33878forum=56 Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] how to stop an in-progress fsck that runs at boot?
I can't seem to find the answer to this question via web search... I changed some hardware on a server, and upon powering it back on, got the /dev/xxx has gone 40 days without being check, check forced message. Now it's running fsck on a huge (2 TB) ext3 filesystem (5400 RPM drives no less). How can I stop this in-progress check? Ctrl-C doesn't seem to have any effect. Is the only answer to wait it out? Also, as a side question: I always do this---let my servers run for a very long time, power down to change/upgrade hardware, then forget about the forced fsck, then pull my hair out waiting for it to finish (because I can't figure out how to stop it once it starts). I know about tune2fs -c and -i, and also the last (or is it second to last?) column in /etc/fstab. My question is more along the lines of best practices---what are most people doing with regards to regular fsck's of ext2/3/4 filesystems? Do you just take the defaults, and let it delay the boot process by however long it takes? Disable it completely? Or do something like taking the filesystem offline on a running system? Something else? Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NetApp DataFabric Manager/Sybase/SQLAnywhere on CentOS?
On Wed, Jul 20, 2011 at 12:39 PM, James Hogarth james.hoga...@gmail.com wrote: I am trying to install DFM 4.0.2, and have tried on both CentOS 4.8 i386 and CentOS 5.5 x86_64. I have edited my /etc/redhat-release file to be equal to RHEL's, as the DFM installer immediately aborts if that isn't right. However, I still have errors during the install: ... Communication error error: %post(NTAPdfm-3.8-6640.i386) scriptlet failed, exit status 1 ... Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Running Linux 2.6.9-89.0.25.ELsmp #1 SMP Thu May 6 12:28:03 EDT 2010 on X86 (X86_64) Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Server built for X86 processor architecture Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Asynchronous IO disabled due to lack of proper OS support You said you tried 4.8 32bit and 5.5 64bit. how about 5.6 32bit? The era would be about right on C5 and it's obviously looking for 32bit libraries and architecture What do the docs and NetApp say about RHEL support? Turns out the problem was with our /etc/security/limits.conf setting. We (by design) limit virtual memory to 2 GB. Apparently the installer needs more than that. So we did a umask -v unlimited and the install worked! Hope this is helpful for anyone else with the same problem. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] NetApp DataFabric Manager/Sybase/SQLAnywhere on CentOS?
Hello, Has anyone out there successfully installed NetApp's DataFabric Manager (DFM) on CentOS? If so, what version of DFM, CentOS, and what architecture? I am trying to install DFM 4.0.2, and have tried on both CentOS 4.8 i386 and CentOS 5.5 x86_64. I have edited my /etc/redhat-release file to be equal to RHEL's, as the DFM installer immediately aborts if that isn't right. However, I still have errors during the install: Enter your NetApp DataFabric Manager license key [?,q]: entered my key Beginning the installation ... Preparing...### [100%] 1:NTAPdfm### [100%] Installing scripts in /etc/init.d directory. Configuring DataFabric Manager server services. Setting up sql ... Starting SQL ... Communication error error: %post(NTAPdfm-3.8-6640.i386) scriptlet failed, exit status 1 /var/log/messages has this: [ ... ] Jul 19 12:58:10 lnxsvr41 SQLAnywhere(monitordb): 16 logical processor(s) on 2 physical processor(s) detected. Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Per-processor licensing model. The server is licensed to use 2 physical processor(s). Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): This server is licensed to: Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): DFM User Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): NetApp Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Running Linux 2.6.9-89.0.25.ELsmp #1 SMP Thu May 6 12:28:03 EDT 2010 on X86 (X86_64) Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Server built for X86 processor architecture Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Asynchronous IO disabled due to lack of proper OS support Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Maximum cache size adjusted to 1670312K Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Authenticated Server licensed for use with Authenticated Applications only Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): 8192K of memory used for caching Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Minimum cache size: 8192K, maximum cache size: 1670312K Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Using a maximum page size of 8192 bytes Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): TCP/IP functions not found Jul 19 12:58:11 lnxsvr41 SQLAnywhere(monitordb): Database server shutdown due to startup error Looks like the problem is with SQLAnywhere, which appears to be a Sybase product. Anyone been down this road? So far NetApp is stonewalling me on CentOS support. Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] scheduling differences between CentOS 4 and CentOS 5?
On Tue, May 24, 2011 at 02:22:12PM -0400, R P Herrold wrote: On Mon, 23 May 2011, Mag Gam wrote: I would like to confirm Matt's claim. I too experienced larger latencies with Centos 5.x compared to 4.x. My application is very network sensitive and its easy to prove using lat_tcp. Russ, I am curious about identifying the problem. What tools do you recommend to find where the latency is coming from in the application? I went through the obvious candidates: system calls (loss of control of when if ever the scheduler decides to let your process run again) This is almost certainly what it is for us. But in this situation, these calls are limited to mutex operations and condition variable signaling. polling v select polling is almost always a wrong approach when latency reduction is in play (reading and understanding: man 2 select_tut is time very well spent) We are using select(). However, that is only for the networking part (basically using select() to wait on data from a socket). Here, my concern isn't with network latency---it's with intra process latency. choice of implementation language -- the issue here being if one uses a scripting language, one cannot 'see' the time leaks C/C++ here. Doing metrics permits both 'hot spot' analysis, and moves the coding from 'guesstimation' to software engineering. We use graphviz, and gnuplot on the plain text 'CSV-style' timings files to 'see' outliers and hotspots We're basically doing that. We pre-allocate a huge 2D array for keeping stopwatch points throughout the program. Each column represents a different stopwatch point, and each row represents and different iteration through these measured points. After a lot of iterations (usually at least 100k), the numbers are dumped to a file for analysis. Basically, the standard deviation from one iteration to the next is fairly low. It's not like there are a few outliers driving the average intra-process latency up; it's just that, in general, going from point A to point B takes longer with the newer kernels. For what it's worth, I tried a 2.6.39 mainline kernel (from elrepo), and the intra-process latencies get still worse. It appears that whatever changes are being made to the kernel, it's bad for our kind of program. I'm trying to figure out, from a conceptual level, what those changes are. I'm looking for an easier way to understand than reading the kernel source and change history. :) ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] scheduling differences between CentOS 4 and CentOS 5?
We have several latency-sensitive pipeline-style programs that have a measurable performance degredation when run on CentOS 5.x versus CentOS 4.x. By pipeline program, I mean one that has multiple threads. The mutiple threads work on shared data. Between each thread, there is a queue. So thread A gets data, pushes into Qab, thread B pulls from Qab, does some processing, then pushes into Qbc, thread C pulls from Qbc, etc. The initial data is from the network (generated by a 3rd party). We basically measure the time from when the data is received to when the last thread performs its task. In our application, we see an increase of anywhere from 20 to 50 microseconds when moving from CentOS 4 to CentOS 5. I have used a few methods of profiling our application, and determined that the added latency on CentOS 5 comes from queue operations (in particular, popping). However, I can improve performance on CentOS 5 (to be the same as CentOS 4) by using taskset to bind the program to a subset of the available cores. So it appers to me, between CentOS 4 and 5, there was some change (presumably to the kernel) that caused threads to be scheduled differently (and this difference is suboptimal for our application). While I can solve this problem with taskset, my preference is to not have to do this. I'm hoping there's some kind of kernel tunable (or maybe collection of tunables) whose default was changed between versions. Anyone have any experience with this? Perhaps some more areas to investigate? Thanks, Matt ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos