Remind me again of your production OS?  And, did you create the kernel
module using this OS?

Becky

On Thu, Aug 2, 2012 at 12:32 PM, Becky Ligon <[email protected]> wrote:

> Jim:
>
> It might be easier for me to debug if you can set up an account for me and
> let me look at your environment.  Is this possible?
>
> Becky
>
>
> On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote:
>
>> Actually a user has figured it out, at least in one directory: 66
>> enteries work, but 67 fail:
>>
>> Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to
>> figure out what was going on. If you are interested, you could try
>> these steps in this directory above:
>>
>> 1) ls |wc -l (gives you 66)
>>
>> 2) emacs a.txt (creates new file a.txt)
>>
>> 3) CRTL x-s (saves new file)
>>
>> 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error)
>>
>> 5) rm a.txt (removes file- can't tab to finish name, must type in entire)
>>
>> 6) ls |wc -l (gives you 66 - all back to normal)
>>
>> --Jim
>>
>> On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote:
>> > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all
>> > of them) now.
>> >
>> > Output on a sample directory:
>> > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS
>> > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid argument
>> > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS
>> > output
>> > getncfromNCAR.csh
>> > mz4assim_conus_1h_20070101.nc
>> > mz4assim_conus_1h_20070102.nc
>> >
>> >
>> > Another user indicates this is based on how many files are in the
>> > directory.  If he knows the file name, the file is still accessible,
>> > but ls or tab completion or anything like that fail.  If he deletes
>> > files to get it under a not-exactly-deturmined amount, ls works again.
>> >
>> > --Jim
>> >
>> > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]> wrote:
>> >> Jim:
>> >>
>> >> Are you running 2.8.6 on the server and the client?  Or, just  2.8.6
>> from
>> >> the head node?
>> >>
>> >> Can you run a "ls" on their directories that appear to be missing
>> data?  Can
>> >> you also run pvfs2-ls on those same directories?  Please send me the
>> output
>> >> from both commands.
>> >>
>> >> Thanks,
>> >> Becky
>> >>
>> >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]>
>> wrote:
>> >>>
>> >>> So, since switching over to 2.8.6, I've had two users report that
>> >>> their larger directories are missing files / data.
>> >>>
>> >>> Now I'm really in for it....I'm asking for more details, but I'll need
>> >>> to address this pretty thoroughly and rapidly...File systems that
>> >>> loose user data are not useful.
>> >>>
>> >>> --Jim
>> >>>
>> >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>> > Jim:
>> >>> >
>> >>> > The documentation link that I sent doesn't seem to work.  Instead:
>> >>> >
>> >>> > go to www.orangefs.org and click on the html link for the install
>> guide,
>> >>> > about midway down the page.
>> >>> >
>> >>> > the install guide has a section on setting up a client and in
>> section
>> >>> > 3.3 is
>> >>> > the description of the pvfs2tab file.
>> >>> >
>> >>> > Becky
>> >>> >
>> >>> >
>> >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>> >>
>> >>> >> Jim:
>> >>> >>
>> >>> >> To generate a new config file, issue the command:
>> >>> >>
>> >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name>
>> >>> >>
>> >>> >> You will be asked a set of questions regarding your installation.
>>  This
>> >>> >> utility may not provide everything you need, just depends on your
>> >>> >> setup.  To
>> >>> >> help you, I will forward you a copy of our production conf file.
>>  You
>> >>> >> can
>> >>> >> compare it to your own needs and modify the new conf file as
>> needed.
>> >>> >> After
>> >>> >> you create a new conf file, I would be happy to review it for you.
>> >>> >>
>> >>> >> I'm not sure how your clients have started without a proper
>> pvfs2tab
>> >>> >> file,
>> >>> >> unless you have the appropriate info in your fstab file.  The mount
>> >>> >> info
>> >>> >> could be in either file.  I will send you a copy of our production
>> >>> >> pvfs2tab
>> >>> >> file as an example.
>> >>> >>
>> >>> >> The link below will describe how to create the entries in the
>> >>> >> pvfs2tab/fstab file.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client
>> >>> >>
>> >>> >> Thanks for giving 2.8.6 a try!  Let me know how it goes!
>> >>> >>
>> >>> >> Becky
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there
>> and a
>> >>> >>> full cluster at the moment, so I can't intentionally reboot it.
>>  If it
>> >>> >>> crashes on me today, I'll take the opportunity to update
>> everything as
>> >>> >>> soon as it comes back and reboot it again.  Otherwise, I'll try
>> early
>> >>> >>> tomorrow morning to load and reboot.
>> >>> >>>
>> >>> >>> Also, you previously mentioned my pvfs2 server configuration file
>> >>> >>> format was out of date.  Can you suggest a new config file format
>> to
>> >>> >>> use based on what I gave you?  Also, I've never had a pvfs2tab
>> file on
>> >>> >>> my clients, and my attempts to create one so far have failed.  It
>> >>> >>> seems I don't know the proper syntax, and I haven't found a
>> >>> >>> sufficiently clear documentation on that either.  It has worked
>> for ~4
>> >>> >>> years without one, but...
>> >>> >>>
>> >>> >>> --Jim
>> >>> >>>
>> >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon <[email protected]
>> >
>> >>> >>> wrote:
>> >>> >>> > Jim:
>> >>> >>> >
>> >>> >>> > Next time this happens, can you attach to the pvfs2-client-core
>> >>> >>> > process
>> >>> >>> > using gdb and see if you can tell in which function it seems to
>> >>> >>> > spinning?
>> >>> >>> > Also, you can try turning on client debugging, so we can see
>> what
>> >>> >>> > the
>> >>> >>> > client
>> >>> >>> > core is doing.  To turn on debugging dynamically, issue the
>> >>> >>> > following:
>> >>> >>> >
>> >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug
>> >>> >>> >
>> >>> >>> > With the CPU so high, the client-core may or may not see the
>> change
>> >>> >>> > in
>> >>> >>> > gossip_debug settings.  If it does, then a lot of output will be
>> >>> >>> > generated!
>> >>> >>> > Before you reboot your system, make a copy of the client log and
>> >>> >>> > send
>> >>> >>> > that
>> >>> >>> > to me, along with any information you might get from gdb.
>> >>> >>> >
>> >>> >>> > When you can, please try using 2.8.6 on your head node and see
>> if
>> >>> >>> > you
>> >>> >>> > can
>> >>> >>> > reproduce the problem.
>> >>> >>> >
>> >>> >>> > Thanks,
>> >>> >>> > Becky
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir <
>> [email protected]>
>> >>> >>> > wrote:
>> >>> >>> >>
>> >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and reopened
>> on
>> >>> >>> >> reboot (eg, all entries are lost).  Already checked.  Also, I
>> >>> >>> >> didn't
>> >>> >>> >> see anything in /var/log/messages (I looked there when the
>> problem
>> >>> >>> >> started mounting).  There appears to be no "paper trail" of
>> this
>> >>> >>> >> incident, which is why its been so hard to track down.
>> >>> >>> >>
>> >>> >>> >> --Jim
>> >>> >>> >>
>> >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon <
>> [email protected]>
>> >>> >>> >> wrote:
>> >>> >>> >> > Jim:
>> >>> >>> >> >
>> >>> >>> >> > Please send the pvfs2-client.log from your head node and the
>> >>> >>> >> > /var/log/messages just before you rebooted.  I'm thinking
>> that
>> >>> >>> >> > the
>> >>> >>> >> > high
>> >>> >>> >> > CPU
>> >>> >>> >> > utilization is coming from a failed operation that wasn't
>> cleaned
>> >>> >>> >> > up
>> >>> >>> >> > properly.
>> >>> >>> >> >
>> >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of
>> these
>> >>> >>> >> > high
>> >>> >>> >> > CPU
>> >>> >>> >> > utilization issues.  It would be worth while for you to apply
>> >>> >>> >> > 2.8.6
>> >>> >>> >> > to
>> >>> >>> >> > your
>> >>> >>> >> > head node and see if this particular situation comes up
>> again.
>> >>> >>> >> >
>> >>> >>> >> > Becky
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir <
>> [email protected]>
>> >>> >>> >> > wrote:
>> >>> >>> >> >>
>> >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on
>> 2.8.5.  I
>> >>> >>> >> >> don't
>> >>> >>> >> >> have a crash file, and it looks like its still in the
>> process of
>> >>> >>> >> >> bringing down my head node.  Symptoms were:
>> >>> >>> >> >>
>> >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but
>> >>> >>> >> >> probably
>> >>> >>> >> >> from) the pvfs2 volume.  At some point, CPU usage spikes on
>> the
>> >>> >>> >> >> head
>> >>> >>> >> >> node.  Top shows both the scp and the pvfs2-client-core
>> using
>> >>> >>> >> >> 100%
>> >>> >>> >> >> of
>> >>> >>> >> >> a core.  The load avg just keeps going up and up.  About
>> 29, I
>> >>> >>> >> >> lost
>> >>> >>> >> >> responsiveness from the server.  CPU load shows 62.5%
>> iowait,
>> >>> >>> >> >> 25%
>> >>> >>> >> >> system, 12.5% idle, all others 0.  The only processes of
>> note
>> >>> >>> >> >> running
>> >>> >>> >> >> is the one SCP and the pvfs2 process.
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >> My machine has now gone unresponsive; I'll probably need to
>> go
>> >>> >>> >> >> hit
>> >>> >>> >> >> the
>> >>> >>> >> >> front panel reset button.  When it comes back up, I doubt
>> there
>> >>> >>> >> >> will
>> >>> >>> >> >> be any written logs of what happened.  Hence, why I can
>> never
>> >>> >>> >> >> catch
>> >>> >>> >> >> the logs of the crash; it *thinks* its working until the
>> system
>> >>> >>> >> >> goes
>> >>> >>> >> >> non-responsive and resets.
>> >>> >>> >> >>
>> >>> >>> >> >> --Jim
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >> > --
>> >>> >>> >> > Becky Ligon
>> >>> >>> >> > OrangeFS Support and Development
>> >>> >>> >> > Omnibond Systems
>> >>> >>> >> > Anderson, South Carolina
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > Becky Ligon
>> >>> >>> > OrangeFS Support and Development
>> >>> >>> > Omnibond Systems
>> >>> >>> > Anderson, South Carolina
>> >>> >>> >
>> >>> >>> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Becky Ligon
>> >>> >> OrangeFS Support and Development
>> >>> >> Omnibond Systems
>> >>> >> Anderson, South Carolina
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Becky Ligon
>> >>> > OrangeFS Support and Development
>> >>> > Omnibond Systems
>> >>> > Anderson, South Carolina
>> >>> >
>> >>> >
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Becky Ligon
>> >> OrangeFS Support and Development
>> >> Omnibond Systems
>> >> Anderson, South Carolina
>> >>
>> >>
>>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to