Jim: It might be easier for me to debug if you can set up an account for me and let me look at your environment. Is this possible?
Becky On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote: > Actually a user has figured it out, at least in one directory: 66 > enteries work, but 67 fail: > > Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to > figure out what was going on. If you are interested, you could try > these steps in this directory above: > > 1) ls |wc -l (gives you 66) > > 2) emacs a.txt (creates new file a.txt) > > 3) CRTL x-s (saves new file) > > 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error) > > 5) rm a.txt (removes file- can't tab to finish name, must type in entire) > > 6) ls |wc -l (gives you 66 - all back to normal) > > --Jim > > On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote: > > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all > > of them) now. > > > > Output on a sample directory: > > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS > > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid argument > > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS > > output > > getncfromNCAR.csh > > mz4assim_conus_1h_20070101.nc > > mz4assim_conus_1h_20070102.nc > > > > > > Another user indicates this is based on how many files are in the > > directory. If he knows the file name, the file is still accessible, > > but ls or tab completion or anything like that fail. If he deletes > > files to get it under a not-exactly-deturmined amount, ls works again. > > > > --Jim > > > > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]> wrote: > >> Jim: > >> > >> Are you running 2.8.6 on the server and the client? Or, just 2.8.6 > from > >> the head node? > >> > >> Can you run a "ls" on their directories that appear to be missing data? > Can > >> you also run pvfs2-ls on those same directories? Please send me the > output > >> from both commands. > >> > >> Thanks, > >> Becky > >> > >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]> wrote: > >>> > >>> So, since switching over to 2.8.6, I've had two users report that > >>> their larger directories are missing files / data. > >>> > >>> Now I'm really in for it....I'm asking for more details, but I'll need > >>> to address this pretty thoroughly and rapidly...File systems that > >>> loose user data are not useful. > >>> > >>> --Jim > >>> > >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]> > wrote: > >>> > Jim: > >>> > > >>> > The documentation link that I sent doesn't seem to work. Instead: > >>> > > >>> > go to www.orangefs.org and click on the html link for the install > guide, > >>> > about midway down the page. > >>> > > >>> > the install guide has a section on setting up a client and in section > >>> > 3.3 is > >>> > the description of the pvfs2tab file. > >>> > > >>> > Becky > >>> > > >>> > > >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]> > wrote: > >>> >> > >>> >> Jim: > >>> >> > >>> >> To generate a new config file, issue the command: > >>> >> > >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name> > >>> >> > >>> >> You will be asked a set of questions regarding your installation. > This > >>> >> utility may not provide everything you need, just depends on your > >>> >> setup. To > >>> >> help you, I will forward you a copy of our production conf file. > You > >>> >> can > >>> >> compare it to your own needs and modify the new conf file as needed. > >>> >> After > >>> >> you create a new conf file, I would be happy to review it for you. > >>> >> > >>> >> I'm not sure how your clients have started without a proper pvfs2tab > >>> >> file, > >>> >> unless you have the appropriate info in your fstab file. The mount > >>> >> info > >>> >> could be in either file. I will send you a copy of our production > >>> >> pvfs2tab > >>> >> file as an example. > >>> >> > >>> >> The link below will describe how to create the entries in the > >>> >> pvfs2tab/fstab file. > >>> >> > >>> >> > >>> >> > >>> >> > http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client > >>> >> > >>> >> Thanks for giving 2.8.6 a try! Let me know how it goes! > >>> >> > >>> >> Becky > >>> >> > >>> >> > >>> >> > >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]> > >>> >> wrote: > >>> >>> > >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there > and a > >>> >>> full cluster at the moment, so I can't intentionally reboot it. > If it > >>> >>> crashes on me today, I'll take the opportunity to update > everything as > >>> >>> soon as it comes back and reboot it again. Otherwise, I'll try > early > >>> >>> tomorrow morning to load and reboot. > >>> >>> > >>> >>> Also, you previously mentioned my pvfs2 server configuration file > >>> >>> format was out of date. Can you suggest a new config file format > to > >>> >>> use based on what I gave you? Also, I've never had a pvfs2tab > file on > >>> >>> my clients, and my attempts to create one so far have failed. It > >>> >>> seems I don't know the proper syntax, and I haven't found a > >>> >>> sufficiently clear documentation on that either. It has worked > for ~4 > >>> >>> years without one, but... > >>> >>> > >>> >>> --Jim > >>> >>> > >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon <[email protected]> > >>> >>> wrote: > >>> >>> > Jim: > >>> >>> > > >>> >>> > Next time this happens, can you attach to the pvfs2-client-core > >>> >>> > process > >>> >>> > using gdb and see if you can tell in which function it seems to > >>> >>> > spinning? > >>> >>> > Also, you can try turning on client debugging, so we can see what > >>> >>> > the > >>> >>> > client > >>> >>> > core is doing. To turn on debugging dynamically, issue the > >>> >>> > following: > >>> >>> > > >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug > >>> >>> > > >>> >>> > With the CPU so high, the client-core may or may not see the > change > >>> >>> > in > >>> >>> > gossip_debug settings. If it does, then a lot of output will be > >>> >>> > generated! > >>> >>> > Before you reboot your system, make a copy of the client log and > >>> >>> > send > >>> >>> > that > >>> >>> > to me, along with any information you might get from gdb. > >>> >>> > > >>> >>> > When you can, please try using 2.8.6 on your head node and see if > >>> >>> > you > >>> >>> > can > >>> >>> > reproduce the problem. > >>> >>> > > >>> >>> > Thanks, > >>> >>> > Becky > >>> >>> > > >>> >>> > > >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir < > [email protected]> > >>> >>> > wrote: > >>> >>> >> > >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and reopened on > >>> >>> >> reboot (eg, all entries are lost). Already checked. Also, I > >>> >>> >> didn't > >>> >>> >> see anything in /var/log/messages (I looked there when the > problem > >>> >>> >> started mounting). There appears to be no "paper trail" of this > >>> >>> >> incident, which is why its been so hard to track down. > >>> >>> >> > >>> >>> >> --Jim > >>> >>> >> > >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon < > [email protected]> > >>> >>> >> wrote: > >>> >>> >> > Jim: > >>> >>> >> > > >>> >>> >> > Please send the pvfs2-client.log from your head node and the > >>> >>> >> > /var/log/messages just before you rebooted. I'm thinking that > >>> >>> >> > the > >>> >>> >> > high > >>> >>> >> > CPU > >>> >>> >> > utilization is coming from a failed operation that wasn't > cleaned > >>> >>> >> > up > >>> >>> >> > properly. > >>> >>> >> > > >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of these > >>> >>> >> > high > >>> >>> >> > CPU > >>> >>> >> > utilization issues. It would be worth while for you to apply > >>> >>> >> > 2.8.6 > >>> >>> >> > to > >>> >>> >> > your > >>> >>> >> > head node and see if this particular situation comes up again. > >>> >>> >> > > >>> >>> >> > Becky > >>> >>> >> > > >>> >>> >> > > >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir < > [email protected]> > >>> >>> >> > wrote: > >>> >>> >> >> > >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on 2.8.5. > I > >>> >>> >> >> don't > >>> >>> >> >> have a crash file, and it looks like its still in the > process of > >>> >>> >> >> bringing down my head node. Symptoms were: > >>> >>> >> >> > >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but > >>> >>> >> >> probably > >>> >>> >> >> from) the pvfs2 volume. At some point, CPU usage spikes on > the > >>> >>> >> >> head > >>> >>> >> >> node. Top shows both the scp and the pvfs2-client-core using > >>> >>> >> >> 100% > >>> >>> >> >> of > >>> >>> >> >> a core. The load avg just keeps going up and up. About 29, > I > >>> >>> >> >> lost > >>> >>> >> >> responsiveness from the server. CPU load shows 62.5% iowait, > >>> >>> >> >> 25% > >>> >>> >> >> system, 12.5% idle, all others 0. The only processes of note > >>> >>> >> >> running > >>> >>> >> >> is the one SCP and the pvfs2 process. > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> My machine has now gone unresponsive; I'll probably need to > go > >>> >>> >> >> hit > >>> >>> >> >> the > >>> >>> >> >> front panel reset button. When it comes back up, I doubt > there > >>> >>> >> >> will > >>> >>> >> >> be any written logs of what happened. Hence, why I can never > >>> >>> >> >> catch > >>> >>> >> >> the logs of the crash; it *thinks* its working until the > system > >>> >>> >> >> goes > >>> >>> >> >> non-responsive and resets. > >>> >>> >> >> > >>> >>> >> >> --Jim > >>> >>> >> > > >>> >>> >> > > >>> >>> >> > > >>> >>> >> > > >>> >>> >> > -- > >>> >>> >> > Becky Ligon > >>> >>> >> > OrangeFS Support and Development > >>> >>> >> > Omnibond Systems > >>> >>> >> > Anderson, South Carolina > >>> >>> >> > > >>> >>> >> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > -- > >>> >>> > Becky Ligon > >>> >>> > OrangeFS Support and Development > >>> >>> > Omnibond Systems > >>> >>> > Anderson, South Carolina > >>> >>> > > >>> >>> > > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Becky Ligon > >>> >> OrangeFS Support and Development > >>> >> Omnibond Systems > >>> >> Anderson, South Carolina > >>> >> > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > Becky Ligon > >>> > OrangeFS Support and Development > >>> > Omnibond Systems > >>> > Anderson, South Carolina > >>> > > >>> > > >> > >> > >> > >> > >> -- > >> Becky Ligon > >> OrangeFS Support and Development > >> Omnibond Systems > >> Anderson, South Carolina > >> > >> > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
