Remind me again of your production OS? And, did you create the kernel module using this OS?
Becky On Thu, Aug 2, 2012 at 12:32 PM, Becky Ligon <[email protected]> wrote: > Jim: > > It might be easier for me to debug if you can set up an account for me and > let me look at your environment. Is this possible? > > Becky > > > On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote: > >> Actually a user has figured it out, at least in one directory: 66 >> enteries work, but 67 fail: >> >> Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to >> figure out what was going on. If you are interested, you could try >> these steps in this directory above: >> >> 1) ls |wc -l (gives you 66) >> >> 2) emacs a.txt (creates new file a.txt) >> >> 3) CRTL x-s (saves new file) >> >> 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error) >> >> 5) rm a.txt (removes file- can't tab to finish name, must type in entire) >> >> 6) ls |wc -l (gives you 66 - all back to normal) >> >> --Jim >> >> On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote: >> > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all >> > of them) now. >> > >> > Output on a sample directory: >> > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS >> > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid argument >> > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS >> > output >> > getncfromNCAR.csh >> > mz4assim_conus_1h_20070101.nc >> > mz4assim_conus_1h_20070102.nc >> > >> > >> > Another user indicates this is based on how many files are in the >> > directory. If he knows the file name, the file is still accessible, >> > but ls or tab completion or anything like that fail. If he deletes >> > files to get it under a not-exactly-deturmined amount, ls works again. >> > >> > --Jim >> > >> > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]> wrote: >> >> Jim: >> >> >> >> Are you running 2.8.6 on the server and the client? Or, just 2.8.6 >> from >> >> the head node? >> >> >> >> Can you run a "ls" on their directories that appear to be missing >> data? Can >> >> you also run pvfs2-ls on those same directories? Please send me the >> output >> >> from both commands. >> >> >> >> Thanks, >> >> Becky >> >> >> >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]> >> wrote: >> >>> >> >>> So, since switching over to 2.8.6, I've had two users report that >> >>> their larger directories are missing files / data. >> >>> >> >>> Now I'm really in for it....I'm asking for more details, but I'll need >> >>> to address this pretty thoroughly and rapidly...File systems that >> >>> loose user data are not useful. >> >>> >> >>> --Jim >> >>> >> >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]> >> wrote: >> >>> > Jim: >> >>> > >> >>> > The documentation link that I sent doesn't seem to work. Instead: >> >>> > >> >>> > go to www.orangefs.org and click on the html link for the install >> guide, >> >>> > about midway down the page. >> >>> > >> >>> > the install guide has a section on setting up a client and in >> section >> >>> > 3.3 is >> >>> > the description of the pvfs2tab file. >> >>> > >> >>> > Becky >> >>> > >> >>> > >> >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]> >> wrote: >> >>> >> >> >>> >> Jim: >> >>> >> >> >>> >> To generate a new config file, issue the command: >> >>> >> >> >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name> >> >>> >> >> >>> >> You will be asked a set of questions regarding your installation. >> This >> >>> >> utility may not provide everything you need, just depends on your >> >>> >> setup. To >> >>> >> help you, I will forward you a copy of our production conf file. >> You >> >>> >> can >> >>> >> compare it to your own needs and modify the new conf file as >> needed. >> >>> >> After >> >>> >> you create a new conf file, I would be happy to review it for you. >> >>> >> >> >>> >> I'm not sure how your clients have started without a proper >> pvfs2tab >> >>> >> file, >> >>> >> unless you have the appropriate info in your fstab file. The mount >> >>> >> info >> >>> >> could be in either file. I will send you a copy of our production >> >>> >> pvfs2tab >> >>> >> file as an example. >> >>> >> >> >>> >> The link below will describe how to create the entries in the >> >>> >> pvfs2tab/fstab file. >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client >> >>> >> >> >>> >> Thanks for giving 2.8.6 a try! Let me know how it goes! >> >>> >> >> >>> >> Becky >> >>> >> >> >>> >> >> >>> >> >> >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]> >> >>> >> wrote: >> >>> >>> >> >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there >> and a >> >>> >>> full cluster at the moment, so I can't intentionally reboot it. >> If it >> >>> >>> crashes on me today, I'll take the opportunity to update >> everything as >> >>> >>> soon as it comes back and reboot it again. Otherwise, I'll try >> early >> >>> >>> tomorrow morning to load and reboot. >> >>> >>> >> >>> >>> Also, you previously mentioned my pvfs2 server configuration file >> >>> >>> format was out of date. Can you suggest a new config file format >> to >> >>> >>> use based on what I gave you? Also, I've never had a pvfs2tab >> file on >> >>> >>> my clients, and my attempts to create one so far have failed. It >> >>> >>> seems I don't know the proper syntax, and I haven't found a >> >>> >>> sufficiently clear documentation on that either. It has worked >> for ~4 >> >>> >>> years without one, but... >> >>> >>> >> >>> >>> --Jim >> >>> >>> >> >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon <[email protected] >> > >> >>> >>> wrote: >> >>> >>> > Jim: >> >>> >>> > >> >>> >>> > Next time this happens, can you attach to the pvfs2-client-core >> >>> >>> > process >> >>> >>> > using gdb and see if you can tell in which function it seems to >> >>> >>> > spinning? >> >>> >>> > Also, you can try turning on client debugging, so we can see >> what >> >>> >>> > the >> >>> >>> > client >> >>> >>> > core is doing. To turn on debugging dynamically, issue the >> >>> >>> > following: >> >>> >>> > >> >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug >> >>> >>> > >> >>> >>> > With the CPU so high, the client-core may or may not see the >> change >> >>> >>> > in >> >>> >>> > gossip_debug settings. If it does, then a lot of output will be >> >>> >>> > generated! >> >>> >>> > Before you reboot your system, make a copy of the client log and >> >>> >>> > send >> >>> >>> > that >> >>> >>> > to me, along with any information you might get from gdb. >> >>> >>> > >> >>> >>> > When you can, please try using 2.8.6 on your head node and see >> if >> >>> >>> > you >> >>> >>> > can >> >>> >>> > reproduce the problem. >> >>> >>> > >> >>> >>> > Thanks, >> >>> >>> > Becky >> >>> >>> > >> >>> >>> > >> >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir < >> [email protected]> >> >>> >>> > wrote: >> >>> >>> >> >> >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and reopened >> on >> >>> >>> >> reboot (eg, all entries are lost). Already checked. Also, I >> >>> >>> >> didn't >> >>> >>> >> see anything in /var/log/messages (I looked there when the >> problem >> >>> >>> >> started mounting). There appears to be no "paper trail" of >> this >> >>> >>> >> incident, which is why its been so hard to track down. >> >>> >>> >> >> >>> >>> >> --Jim >> >>> >>> >> >> >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon < >> [email protected]> >> >>> >>> >> wrote: >> >>> >>> >> > Jim: >> >>> >>> >> > >> >>> >>> >> > Please send the pvfs2-client.log from your head node and the >> >>> >>> >> > /var/log/messages just before you rebooted. I'm thinking >> that >> >>> >>> >> > the >> >>> >>> >> > high >> >>> >>> >> > CPU >> >>> >>> >> > utilization is coming from a failed operation that wasn't >> cleaned >> >>> >>> >> > up >> >>> >>> >> > properly. >> >>> >>> >> > >> >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of >> these >> >>> >>> >> > high >> >>> >>> >> > CPU >> >>> >>> >> > utilization issues. It would be worth while for you to apply >> >>> >>> >> > 2.8.6 >> >>> >>> >> > to >> >>> >>> >> > your >> >>> >>> >> > head node and see if this particular situation comes up >> again. >> >>> >>> >> > >> >>> >>> >> > Becky >> >>> >>> >> > >> >>> >>> >> > >> >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir < >> [email protected]> >> >>> >>> >> > wrote: >> >>> >>> >> >> >> >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on >> 2.8.5. I >> >>> >>> >> >> don't >> >>> >>> >> >> have a crash file, and it looks like its still in the >> process of >> >>> >>> >> >> bringing down my head node. Symptoms were: >> >>> >>> >> >> >> >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but >> >>> >>> >> >> probably >> >>> >>> >> >> from) the pvfs2 volume. At some point, CPU usage spikes on >> the >> >>> >>> >> >> head >> >>> >>> >> >> node. Top shows both the scp and the pvfs2-client-core >> using >> >>> >>> >> >> 100% >> >>> >>> >> >> of >> >>> >>> >> >> a core. The load avg just keeps going up and up. About >> 29, I >> >>> >>> >> >> lost >> >>> >>> >> >> responsiveness from the server. CPU load shows 62.5% >> iowait, >> >>> >>> >> >> 25% >> >>> >>> >> >> system, 12.5% idle, all others 0. The only processes of >> note >> >>> >>> >> >> running >> >>> >>> >> >> is the one SCP and the pvfs2 process. >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> My machine has now gone unresponsive; I'll probably need to >> go >> >>> >>> >> >> hit >> >>> >>> >> >> the >> >>> >>> >> >> front panel reset button. When it comes back up, I doubt >> there >> >>> >>> >> >> will >> >>> >>> >> >> be any written logs of what happened. Hence, why I can >> never >> >>> >>> >> >> catch >> >>> >>> >> >> the logs of the crash; it *thinks* its working until the >> system >> >>> >>> >> >> goes >> >>> >>> >> >> non-responsive and resets. >> >>> >>> >> >> >> >>> >>> >> >> --Jim >> >>> >>> >> > >> >>> >>> >> > >> >>> >>> >> > >> >>> >>> >> > >> >>> >>> >> > -- >> >>> >>> >> > Becky Ligon >> >>> >>> >> > OrangeFS Support and Development >> >>> >>> >> > Omnibond Systems >> >>> >>> >> > Anderson, South Carolina >> >>> >>> >> > >> >>> >>> >> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > -- >> >>> >>> > Becky Ligon >> >>> >>> > OrangeFS Support and Development >> >>> >>> > Omnibond Systems >> >>> >>> > Anderson, South Carolina >> >>> >>> > >> >>> >>> > >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> Becky Ligon >> >>> >> OrangeFS Support and Development >> >>> >> Omnibond Systems >> >>> >> Anderson, South Carolina >> >>> >> >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > Becky Ligon >> >>> > OrangeFS Support and Development >> >>> > Omnibond Systems >> >>> > Anderson, South Carolina >> >>> > >> >>> > >> >> >> >> >> >> >> >> >> >> -- >> >> Becky Ligon >> >> OrangeFS Support and Development >> >> Omnibond Systems >> >> Anderson, South Carolina >> >> >> >> >> > > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
