Jim: I am running a 2.8.6 client agains a 2.8.5 server on CentOS-6.2 and tried to reproduce your problem. I have over 100 files in my directory and my "ls" is working. So, I'm thinking that the creation and/or installation of the kernel module and client are hosed some how on your site.
There *might* be a difference in the two OrangeFS versions that you are using. I will install from the rpms that you sent me and see if I can recreate the problem. Becky On Thu, Aug 2, 2012 at 12:35 PM, Becky Ligon <[email protected]> wrote: > Remind me again of your production OS? And, did you create the kernel > module using this OS? > > Becky > > > On Thu, Aug 2, 2012 at 12:32 PM, Becky Ligon <[email protected]> wrote: > >> Jim: >> >> It might be easier for me to debug if you can set up an account for me >> and let me look at your environment. Is this possible? >> >> Becky >> >> >> On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote: >> >>> Actually a user has figured it out, at least in one directory: 66 >>> enteries work, but 67 fail: >>> >>> Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to >>> figure out what was going on. If you are interested, you could try >>> these steps in this directory above: >>> >>> 1) ls |wc -l (gives you 66) >>> >>> 2) emacs a.txt (creates new file a.txt) >>> >>> 3) CRTL x-s (saves new file) >>> >>> 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error) >>> >>> 5) rm a.txt (removes file- can't tab to finish name, must type in entire) >>> >>> 6) ls |wc -l (gives you 66 - all back to normal) >>> >>> --Jim >>> >>> On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote: >>> > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all >>> > of them) now. >>> > >>> > Output on a sample directory: >>> > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS >>> > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid >>> argument >>> > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS >>> > output >>> > getncfromNCAR.csh >>> > mz4assim_conus_1h_20070101.nc >>> > mz4assim_conus_1h_20070102.nc >>> > >>> > >>> > Another user indicates this is based on how many files are in the >>> > directory. If he knows the file name, the file is still accessible, >>> > but ls or tab completion or anything like that fail. If he deletes >>> > files to get it under a not-exactly-deturmined amount, ls works again. >>> > >>> > --Jim >>> > >>> > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]> >>> wrote: >>> >> Jim: >>> >> >>> >> Are you running 2.8.6 on the server and the client? Or, just 2.8.6 >>> from >>> >> the head node? >>> >> >>> >> Can you run a "ls" on their directories that appear to be missing >>> data? Can >>> >> you also run pvfs2-ls on those same directories? Please send me the >>> output >>> >> from both commands. >>> >> >>> >> Thanks, >>> >> Becky >>> >> >>> >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]> >>> wrote: >>> >>> >>> >>> So, since switching over to 2.8.6, I've had two users report that >>> >>> their larger directories are missing files / data. >>> >>> >>> >>> Now I'm really in for it....I'm asking for more details, but I'll >>> need >>> >>> to address this pretty thoroughly and rapidly...File systems that >>> >>> loose user data are not useful. >>> >>> >>> >>> --Jim >>> >>> >>> >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]> >>> wrote: >>> >>> > Jim: >>> >>> > >>> >>> > The documentation link that I sent doesn't seem to work. Instead: >>> >>> > >>> >>> > go to www.orangefs.org and click on the html link for the install >>> guide, >>> >>> > about midway down the page. >>> >>> > >>> >>> > the install guide has a section on setting up a client and in >>> section >>> >>> > 3.3 is >>> >>> > the description of the pvfs2tab file. >>> >>> > >>> >>> > Becky >>> >>> > >>> >>> > >>> >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]> >>> wrote: >>> >>> >> >>> >>> >> Jim: >>> >>> >> >>> >>> >> To generate a new config file, issue the command: >>> >>> >> >>> >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name> >>> >>> >> >>> >>> >> You will be asked a set of questions regarding your installation. >>> This >>> >>> >> utility may not provide everything you need, just depends on your >>> >>> >> setup. To >>> >>> >> help you, I will forward you a copy of our production conf file. >>> You >>> >>> >> can >>> >>> >> compare it to your own needs and modify the new conf file as >>> needed. >>> >>> >> After >>> >>> >> you create a new conf file, I would be happy to review it for you. >>> >>> >> >>> >>> >> I'm not sure how your clients have started without a proper >>> pvfs2tab >>> >>> >> file, >>> >>> >> unless you have the appropriate info in your fstab file. The >>> mount >>> >>> >> info >>> >>> >> could be in either file. I will send you a copy of our production >>> >>> >> pvfs2tab >>> >>> >> file as an example. >>> >>> >> >>> >>> >> The link below will describe how to create the entries in the >>> >>> >> pvfs2tab/fstab file. >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client >>> >>> >> >>> >>> >> Thanks for giving 2.8.6 a try! Let me know how it goes! >>> >>> >> >>> >>> >> Becky >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]> >>> >>> >> wrote: >>> >>> >>> >>> >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there >>> and a >>> >>> >>> full cluster at the moment, so I can't intentionally reboot it. >>> If it >>> >>> >>> crashes on me today, I'll take the opportunity to update >>> everything as >>> >>> >>> soon as it comes back and reboot it again. Otherwise, I'll try >>> early >>> >>> >>> tomorrow morning to load and reboot. >>> >>> >>> >>> >>> >>> Also, you previously mentioned my pvfs2 server configuration file >>> >>> >>> format was out of date. Can you suggest a new config file >>> format to >>> >>> >>> use based on what I gave you? Also, I've never had a pvfs2tab >>> file on >>> >>> >>> my clients, and my attempts to create one so far have failed. It >>> >>> >>> seems I don't know the proper syntax, and I haven't found a >>> >>> >>> sufficiently clear documentation on that either. It has worked >>> for ~4 >>> >>> >>> years without one, but... >>> >>> >>> >>> >>> >>> --Jim >>> >>> >>> >>> >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon < >>> [email protected]> >>> >>> >>> wrote: >>> >>> >>> > Jim: >>> >>> >>> > >>> >>> >>> > Next time this happens, can you attach to the pvfs2-client-core >>> >>> >>> > process >>> >>> >>> > using gdb and see if you can tell in which function it seems to >>> >>> >>> > spinning? >>> >>> >>> > Also, you can try turning on client debugging, so we can see >>> what >>> >>> >>> > the >>> >>> >>> > client >>> >>> >>> > core is doing. To turn on debugging dynamically, issue the >>> >>> >>> > following: >>> >>> >>> > >>> >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug >>> >>> >>> > >>> >>> >>> > With the CPU so high, the client-core may or may not see the >>> change >>> >>> >>> > in >>> >>> >>> > gossip_debug settings. If it does, then a lot of output will >>> be >>> >>> >>> > generated! >>> >>> >>> > Before you reboot your system, make a copy of the client log >>> and >>> >>> >>> > send >>> >>> >>> > that >>> >>> >>> > to me, along with any information you might get from gdb. >>> >>> >>> > >>> >>> >>> > When you can, please try using 2.8.6 on your head node and see >>> if >>> >>> >>> > you >>> >>> >>> > can >>> >>> >>> > reproduce the problem. >>> >>> >>> > >>> >>> >>> > Thanks, >>> >>> >>> > Becky >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir < >>> [email protected]> >>> >>> >>> > wrote: >>> >>> >>> >> >>> >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and reopened >>> on >>> >>> >>> >> reboot (eg, all entries are lost). Already checked. Also, I >>> >>> >>> >> didn't >>> >>> >>> >> see anything in /var/log/messages (I looked there when the >>> problem >>> >>> >>> >> started mounting). There appears to be no "paper trail" of >>> this >>> >>> >>> >> incident, which is why its been so hard to track down. >>> >>> >>> >> >>> >>> >>> >> --Jim >>> >>> >>> >> >>> >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon < >>> [email protected]> >>> >>> >>> >> wrote: >>> >>> >>> >> > Jim: >>> >>> >>> >> > >>> >>> >>> >> > Please send the pvfs2-client.log from your head node and the >>> >>> >>> >> > /var/log/messages just before you rebooted. I'm thinking >>> that >>> >>> >>> >> > the >>> >>> >>> >> > high >>> >>> >>> >> > CPU >>> >>> >>> >> > utilization is coming from a failed operation that wasn't >>> cleaned >>> >>> >>> >> > up >>> >>> >>> >> > properly. >>> >>> >>> >> > >>> >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of >>> these >>> >>> >>> >> > high >>> >>> >>> >> > CPU >>> >>> >>> >> > utilization issues. It would be worth while for you to >>> apply >>> >>> >>> >> > 2.8.6 >>> >>> >>> >> > to >>> >>> >>> >> > your >>> >>> >>> >> > head node and see if this particular situation comes up >>> again. >>> >>> >>> >> > >>> >>> >>> >> > Becky >>> >>> >>> >> > >>> >>> >>> >> > >>> >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir < >>> [email protected]> >>> >>> >>> >> > wrote: >>> >>> >>> >> >> >>> >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on >>> 2.8.5. I >>> >>> >>> >> >> don't >>> >>> >>> >> >> have a crash file, and it looks like its still in the >>> process of >>> >>> >>> >> >> bringing down my head node. Symptoms were: >>> >>> >>> >> >> >>> >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but >>> >>> >>> >> >> probably >>> >>> >>> >> >> from) the pvfs2 volume. At some point, CPU usage spikes >>> on the >>> >>> >>> >> >> head >>> >>> >>> >> >> node. Top shows both the scp and the pvfs2-client-core >>> using >>> >>> >>> >> >> 100% >>> >>> >>> >> >> of >>> >>> >>> >> >> a core. The load avg just keeps going up and up. About >>> 29, I >>> >>> >>> >> >> lost >>> >>> >>> >> >> responsiveness from the server. CPU load shows 62.5% >>> iowait, >>> >>> >>> >> >> 25% >>> >>> >>> >> >> system, 12.5% idle, all others 0. The only processes of >>> note >>> >>> >>> >> >> running >>> >>> >>> >> >> is the one SCP and the pvfs2 process. >>> >>> >>> >> >> >>> >>> >>> >> >> >>> >>> >>> >> >> My machine has now gone unresponsive; I'll probably need >>> to go >>> >>> >>> >> >> hit >>> >>> >>> >> >> the >>> >>> >>> >> >> front panel reset button. When it comes back up, I doubt >>> there >>> >>> >>> >> >> will >>> >>> >>> >> >> be any written logs of what happened. Hence, why I can >>> never >>> >>> >>> >> >> catch >>> >>> >>> >> >> the logs of the crash; it *thinks* its working until the >>> system >>> >>> >>> >> >> goes >>> >>> >>> >> >> non-responsive and resets. >>> >>> >>> >> >> >>> >>> >>> >> >> --Jim >>> >>> >>> >> > >>> >>> >>> >> > >>> >>> >>> >> > >>> >>> >>> >> > >>> >>> >>> >> > -- >>> >>> >>> >> > Becky Ligon >>> >>> >>> >> > OrangeFS Support and Development >>> >>> >>> >> > Omnibond Systems >>> >>> >>> >> > Anderson, South Carolina >>> >>> >>> >> > >>> >>> >>> >> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > -- >>> >>> >>> > Becky Ligon >>> >>> >>> > OrangeFS Support and Development >>> >>> >>> > Omnibond Systems >>> >>> >>> > Anderson, South Carolina >>> >>> >>> > >>> >>> >>> > >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> -- >>> >>> >> Becky Ligon >>> >>> >> OrangeFS Support and Development >>> >>> >> Omnibond Systems >>> >>> >> Anderson, South Carolina >>> >>> >> >>> >>> >> >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > Becky Ligon >>> >>> > OrangeFS Support and Development >>> >>> > Omnibond Systems >>> >>> > Anderson, South Carolina >>> >>> > >>> >>> > >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Becky Ligon >>> >> OrangeFS Support and Development >>> >> Omnibond Systems >>> >> Anderson, South Carolina >>> >> >>> >> >>> >> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> >> > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
