Jim: Are you running 2.8.6 on the server and the client? Or, just 2.8.6 from the head node?
Can you run a "ls" on their directories that appear to be missing data? Can you also run pvfs2-ls on those same directories? Please send me the output from both commands. Thanks, Becky On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]> wrote: > So, since switching over to 2.8.6, I've had two users report that > their larger directories are missing files / data. > > Now I'm really in for it....I'm asking for more details, but I'll need > to address this pretty thoroughly and rapidly...File systems that > loose user data are not useful. > > --Jim > > On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]> wrote: > > Jim: > > > > The documentation link that I sent doesn't seem to work. Instead: > > > > go to www.orangefs.org and click on the html link for the install guide, > > about midway down the page. > > > > the install guide has a section on setting up a client and in section > 3.3 is > > the description of the pvfs2tab file. > > > > Becky > > > > > > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]> wrote: > >> > >> Jim: > >> > >> To generate a new config file, issue the command: > >> > >> /opt/pvfs2/bin/pvfs2-genconfig <config file name> > >> > >> You will be asked a set of questions regarding your installation. This > >> utility may not provide everything you need, just depends on your > setup. To > >> help you, I will forward you a copy of our production conf file. You > can > >> compare it to your own needs and modify the new conf file as needed. > After > >> you create a new conf file, I would be happy to review it for you. > >> > >> I'm not sure how your clients have started without a proper pvfs2tab > file, > >> unless you have the appropriate info in your fstab file. The mount info > >> could be in either file. I will send you a copy of our production > pvfs2tab > >> file as an example. > >> > >> The link below will describe how to create the entries in the > >> pvfs2tab/fstab file. > >> > >> > >> > http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client > >> > >> Thanks for giving 2.8.6 a try! Let me know how it goes! > >> > >> Becky > >> > >> > >> > >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]> > wrote: > >>> > >>> I've got 2.8.6 ready to install, but I've got 15 users on there and a > >>> full cluster at the moment, so I can't intentionally reboot it. If it > >>> crashes on me today, I'll take the opportunity to update everything as > >>> soon as it comes back and reboot it again. Otherwise, I'll try early > >>> tomorrow morning to load and reboot. > >>> > >>> Also, you previously mentioned my pvfs2 server configuration file > >>> format was out of date. Can you suggest a new config file format to > >>> use based on what I gave you? Also, I've never had a pvfs2tab file on > >>> my clients, and my attempts to create one so far have failed. It > >>> seems I don't know the proper syntax, and I haven't found a > >>> sufficiently clear documentation on that either. It has worked for ~4 > >>> years without one, but... > >>> > >>> --Jim > >>> > >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon <[email protected]> > wrote: > >>> > Jim: > >>> > > >>> > Next time this happens, can you attach to the pvfs2-client-core > process > >>> > using gdb and see if you can tell in which function it seems to > >>> > spinning? > >>> > Also, you can try turning on client debugging, so we can see what the > >>> > client > >>> > core is doing. To turn on debugging dynamically, issue the > following: > >>> > > >>> > echo "all" > /proc/sys/pvfs2/client-debug > >>> > > >>> > With the CPU so high, the client-core may or may not see the change > in > >>> > gossip_debug settings. If it does, then a lot of output will be > >>> > generated! > >>> > Before you reboot your system, make a copy of the client log and send > >>> > that > >>> > to me, along with any information you might get from gdb. > >>> > > >>> > When you can, please try using 2.8.6 on your head node and see if you > >>> > can > >>> > reproduce the problem. > >>> > > >>> > Thanks, > >>> > Becky > >>> > > >>> > > >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir <[email protected]> > >>> > wrote: > >>> >> > >>> >> Unfortunately, the pvfs2-client.log is truncated and reopened on > >>> >> reboot (eg, all entries are lost). Already checked. Also, I didn't > >>> >> see anything in /var/log/messages (I looked there when the problem > >>> >> started mounting). There appears to be no "paper trail" of this > >>> >> incident, which is why its been so hard to track down. > >>> >> > >>> >> --Jim > >>> >> > >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon <[email protected]> > >>> >> wrote: > >>> >> > Jim: > >>> >> > > >>> >> > Please send the pvfs2-client.log from your head node and the > >>> >> > /var/log/messages just before you rebooted. I'm thinking that the > >>> >> > high > >>> >> > CPU > >>> >> > utilization is coming from a failed operation that wasn't cleaned > up > >>> >> > properly. > >>> >> > > >>> >> > As I noted in my previous email, 2.8.6 addressed some of these > high > >>> >> > CPU > >>> >> > utilization issues. It would be worth while for you to apply > 2.8.6 > >>> >> > to > >>> >> > your > >>> >> > head node and see if this particular situation comes up again. > >>> >> > > >>> >> > Becky > >>> >> > > >>> >> > > >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir <[email protected]> > >>> >> > wrote: > >>> >> >> > >>> >> >> I think I caught a pvfs2-induced crash in progress on 2.8.5. I > >>> >> >> don't > >>> >> >> have a crash file, and it looks like its still in the process of > >>> >> >> bringing down my head node. Symptoms were: > >>> >> >> > >>> >> >> Someone was doing an scp from (or to, not sure which, but > probably > >>> >> >> from) the pvfs2 volume. At some point, CPU usage spikes on the > >>> >> >> head > >>> >> >> node. Top shows both the scp and the pvfs2-client-core using > 100% > >>> >> >> of > >>> >> >> a core. The load avg just keeps going up and up. About 29, I > lost > >>> >> >> responsiveness from the server. CPU load shows 62.5% iowait, 25% > >>> >> >> system, 12.5% idle, all others 0. The only processes of note > >>> >> >> running > >>> >> >> is the one SCP and the pvfs2 process. > >>> >> >> > >>> >> >> > >>> >> >> My machine has now gone unresponsive; I'll probably need to go > hit > >>> >> >> the > >>> >> >> front panel reset button. When it comes back up, I doubt there > >>> >> >> will > >>> >> >> be any written logs of what happened. Hence, why I can never > catch > >>> >> >> the logs of the crash; it *thinks* its working until the system > >>> >> >> goes > >>> >> >> non-responsive and resets. > >>> >> >> > >>> >> >> --Jim > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Becky Ligon > >>> >> > OrangeFS Support and Development > >>> >> > Omnibond Systems > >>> >> > Anderson, South Carolina > >>> >> > > >>> >> > > >>> > > >>> > > >>> > > >>> > > >>> > -- > >>> > Becky Ligon > >>> > OrangeFS Support and Development > >>> > Omnibond Systems > >>> > Anderson, South Carolina > >>> > > >>> > > >> > >> > >> > >> > >> -- > >> Becky Ligon > >> OrangeFS Support and Development > >> Omnibond Systems > >> Anderson, South Carolina > >> > >> > > > > > > > > -- > > Becky Ligon > > OrangeFS Support and Development > > Omnibond Systems > > Anderson, South Carolina > > > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
