Jim:

I am running a 2.8.6 client agains a 2.8.5 server on CentOS-6.2 and tried
to reproduce your problem.  I have over 100 files in my directory and my
"ls" is working.  So, I'm thinking that the creation and/or installation of
the kernel module and client are hosed some how on your site.

There *might* be a difference in the two OrangeFS versions that you are
using.  I will install from the rpms that you sent me and see if I can
recreate the problem.

Becky

On Thu, Aug 2, 2012 at 12:35 PM, Becky Ligon <[email protected]> wrote:

> Remind me again of your production OS?  And, did you create the kernel
> module using this OS?
>
> Becky
>
>
> On Thu, Aug 2, 2012 at 12:32 PM, Becky Ligon <[email protected]> wrote:
>
>> Jim:
>>
>> It might be easier for me to debug if you can set up an account for me
>> and let me look at your environment.  Is this possible?
>>
>> Becky
>>
>>
>> On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote:
>>
>>> Actually a user has figured it out, at least in one directory: 66
>>> enteries work, but 67 fail:
>>>
>>> Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to
>>> figure out what was going on. If you are interested, you could try
>>> these steps in this directory above:
>>>
>>> 1) ls |wc -l (gives you 66)
>>>
>>> 2) emacs a.txt (creates new file a.txt)
>>>
>>> 3) CRTL x-s (saves new file)
>>>
>>> 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error)
>>>
>>> 5) rm a.txt (removes file- can't tab to finish name, must type in entire)
>>>
>>> 6) ls |wc -l (gives you 66 - all back to normal)
>>>
>>> --Jim
>>>
>>> On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote:
>>> > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all
>>> > of them) now.
>>> >
>>> > Output on a sample directory:
>>> > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS
>>> > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid
>>> argument
>>> > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS
>>> > output
>>> > getncfromNCAR.csh
>>> > mz4assim_conus_1h_20070101.nc
>>> > mz4assim_conus_1h_20070102.nc
>>> >
>>> >
>>> > Another user indicates this is based on how many files are in the
>>> > directory.  If he knows the file name, the file is still accessible,
>>> > but ls or tab completion or anything like that fail.  If he deletes
>>> > files to get it under a not-exactly-deturmined amount, ls works again.
>>> >
>>> > --Jim
>>> >
>>> > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]>
>>> wrote:
>>> >> Jim:
>>> >>
>>> >> Are you running 2.8.6 on the server and the client?  Or, just  2.8.6
>>> from
>>> >> the head node?
>>> >>
>>> >> Can you run a "ls" on their directories that appear to be missing
>>> data?  Can
>>> >> you also run pvfs2-ls on those same directories?  Please send me the
>>> output
>>> >> from both commands.
>>> >>
>>> >> Thanks,
>>> >> Becky
>>> >>
>>> >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]>
>>> wrote:
>>> >>>
>>> >>> So, since switching over to 2.8.6, I've had two users report that
>>> >>> their larger directories are missing files / data.
>>> >>>
>>> >>> Now I'm really in for it....I'm asking for more details, but I'll
>>> need
>>> >>> to address this pretty thoroughly and rapidly...File systems that
>>> >>> loose user data are not useful.
>>> >>>
>>> >>> --Jim
>>> >>>
>>> >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]>
>>> wrote:
>>> >>> > Jim:
>>> >>> >
>>> >>> > The documentation link that I sent doesn't seem to work.  Instead:
>>> >>> >
>>> >>> > go to www.orangefs.org and click on the html link for the install
>>> guide,
>>> >>> > about midway down the page.
>>> >>> >
>>> >>> > the install guide has a section on setting up a client and in
>>> section
>>> >>> > 3.3 is
>>> >>> > the description of the pvfs2tab file.
>>> >>> >
>>> >>> > Becky
>>> >>> >
>>> >>> >
>>> >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]>
>>> wrote:
>>> >>> >>
>>> >>> >> Jim:
>>> >>> >>
>>> >>> >> To generate a new config file, issue the command:
>>> >>> >>
>>> >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name>
>>> >>> >>
>>> >>> >> You will be asked a set of questions regarding your installation.
>>>  This
>>> >>> >> utility may not provide everything you need, just depends on your
>>> >>> >> setup.  To
>>> >>> >> help you, I will forward you a copy of our production conf file.
>>>  You
>>> >>> >> can
>>> >>> >> compare it to your own needs and modify the new conf file as
>>> needed.
>>> >>> >> After
>>> >>> >> you create a new conf file, I would be happy to review it for you.
>>> >>> >>
>>> >>> >> I'm not sure how your clients have started without a proper
>>> pvfs2tab
>>> >>> >> file,
>>> >>> >> unless you have the appropriate info in your fstab file.  The
>>> mount
>>> >>> >> info
>>> >>> >> could be in either file.  I will send you a copy of our production
>>> >>> >> pvfs2tab
>>> >>> >> file as an example.
>>> >>> >>
>>> >>> >> The link below will describe how to create the entries in the
>>> >>> >> pvfs2tab/fstab file.
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client
>>> >>> >>
>>> >>> >> Thanks for giving 2.8.6 a try!  Let me know how it goes!
>>> >>> >>
>>> >>> >> Becky
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]>
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there
>>> and a
>>> >>> >>> full cluster at the moment, so I can't intentionally reboot it.
>>>  If it
>>> >>> >>> crashes on me today, I'll take the opportunity to update
>>> everything as
>>> >>> >>> soon as it comes back and reboot it again.  Otherwise, I'll try
>>> early
>>> >>> >>> tomorrow morning to load and reboot.
>>> >>> >>>
>>> >>> >>> Also, you previously mentioned my pvfs2 server configuration file
>>> >>> >>> format was out of date.  Can you suggest a new config file
>>> format to
>>> >>> >>> use based on what I gave you?  Also, I've never had a pvfs2tab
>>> file on
>>> >>> >>> my clients, and my attempts to create one so far have failed.  It
>>> >>> >>> seems I don't know the proper syntax, and I haven't found a
>>> >>> >>> sufficiently clear documentation on that either.  It has worked
>>> for ~4
>>> >>> >>> years without one, but...
>>> >>> >>>
>>> >>> >>> --Jim
>>> >>> >>>
>>> >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon <
>>> [email protected]>
>>> >>> >>> wrote:
>>> >>> >>> > Jim:
>>> >>> >>> >
>>> >>> >>> > Next time this happens, can you attach to the pvfs2-client-core
>>> >>> >>> > process
>>> >>> >>> > using gdb and see if you can tell in which function it seems to
>>> >>> >>> > spinning?
>>> >>> >>> > Also, you can try turning on client debugging, so we can see
>>> what
>>> >>> >>> > the
>>> >>> >>> > client
>>> >>> >>> > core is doing.  To turn on debugging dynamically, issue the
>>> >>> >>> > following:
>>> >>> >>> >
>>> >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug
>>> >>> >>> >
>>> >>> >>> > With the CPU so high, the client-core may or may not see the
>>> change
>>> >>> >>> > in
>>> >>> >>> > gossip_debug settings.  If it does, then a lot of output will
>>> be
>>> >>> >>> > generated!
>>> >>> >>> > Before you reboot your system, make a copy of the client log
>>> and
>>> >>> >>> > send
>>> >>> >>> > that
>>> >>> >>> > to me, along with any information you might get from gdb.
>>> >>> >>> >
>>> >>> >>> > When you can, please try using 2.8.6 on your head node and see
>>> if
>>> >>> >>> > you
>>> >>> >>> > can
>>> >>> >>> > reproduce the problem.
>>> >>> >>> >
>>> >>> >>> > Thanks,
>>> >>> >>> > Becky
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir <
>>> [email protected]>
>>> >>> >>> > wrote:
>>> >>> >>> >>
>>> >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and reopened
>>> on
>>> >>> >>> >> reboot (eg, all entries are lost).  Already checked.  Also, I
>>> >>> >>> >> didn't
>>> >>> >>> >> see anything in /var/log/messages (I looked there when the
>>> problem
>>> >>> >>> >> started mounting).  There appears to be no "paper trail" of
>>> this
>>> >>> >>> >> incident, which is why its been so hard to track down.
>>> >>> >>> >>
>>> >>> >>> >> --Jim
>>> >>> >>> >>
>>> >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon <
>>> [email protected]>
>>> >>> >>> >> wrote:
>>> >>> >>> >> > Jim:
>>> >>> >>> >> >
>>> >>> >>> >> > Please send the pvfs2-client.log from your head node and the
>>> >>> >>> >> > /var/log/messages just before you rebooted.  I'm thinking
>>> that
>>> >>> >>> >> > the
>>> >>> >>> >> > high
>>> >>> >>> >> > CPU
>>> >>> >>> >> > utilization is coming from a failed operation that wasn't
>>> cleaned
>>> >>> >>> >> > up
>>> >>> >>> >> > properly.
>>> >>> >>> >> >
>>> >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of
>>> these
>>> >>> >>> >> > high
>>> >>> >>> >> > CPU
>>> >>> >>> >> > utilization issues.  It would be worth while for you to
>>> apply
>>> >>> >>> >> > 2.8.6
>>> >>> >>> >> > to
>>> >>> >>> >> > your
>>> >>> >>> >> > head node and see if this particular situation comes up
>>> again.
>>> >>> >>> >> >
>>> >>> >>> >> > Becky
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir <
>>> [email protected]>
>>> >>> >>> >> > wrote:
>>> >>> >>> >> >>
>>> >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on
>>> 2.8.5.  I
>>> >>> >>> >> >> don't
>>> >>> >>> >> >> have a crash file, and it looks like its still in the
>>> process of
>>> >>> >>> >> >> bringing down my head node.  Symptoms were:
>>> >>> >>> >> >>
>>> >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but
>>> >>> >>> >> >> probably
>>> >>> >>> >> >> from) the pvfs2 volume.  At some point, CPU usage spikes
>>> on the
>>> >>> >>> >> >> head
>>> >>> >>> >> >> node.  Top shows both the scp and the pvfs2-client-core
>>> using
>>> >>> >>> >> >> 100%
>>> >>> >>> >> >> of
>>> >>> >>> >> >> a core.  The load avg just keeps going up and up.  About
>>> 29, I
>>> >>> >>> >> >> lost
>>> >>> >>> >> >> responsiveness from the server.  CPU load shows 62.5%
>>> iowait,
>>> >>> >>> >> >> 25%
>>> >>> >>> >> >> system, 12.5% idle, all others 0.  The only processes of
>>> note
>>> >>> >>> >> >> running
>>> >>> >>> >> >> is the one SCP and the pvfs2 process.
>>> >>> >>> >> >>
>>> >>> >>> >> >>
>>> >>> >>> >> >> My machine has now gone unresponsive; I'll probably need
>>> to go
>>> >>> >>> >> >> hit
>>> >>> >>> >> >> the
>>> >>> >>> >> >> front panel reset button.  When it comes back up, I doubt
>>> there
>>> >>> >>> >> >> will
>>> >>> >>> >> >> be any written logs of what happened.  Hence, why I can
>>> never
>>> >>> >>> >> >> catch
>>> >>> >>> >> >> the logs of the crash; it *thinks* its working until the
>>> system
>>> >>> >>> >> >> goes
>>> >>> >>> >> >> non-responsive and resets.
>>> >>> >>> >> >>
>>> >>> >>> >> >> --Jim
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >> > --
>>> >>> >>> >> > Becky Ligon
>>> >>> >>> >> > OrangeFS Support and Development
>>> >>> >>> >> > Omnibond Systems
>>> >>> >>> >> > Anderson, South Carolina
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>> > --
>>> >>> >>> > Becky Ligon
>>> >>> >>> > OrangeFS Support and Development
>>> >>> >>> > Omnibond Systems
>>> >>> >>> > Anderson, South Carolina
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Becky Ligon
>>> >>> >> OrangeFS Support and Development
>>> >>> >> Omnibond Systems
>>> >>> >> Anderson, South Carolina
>>> >>> >>
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Becky Ligon
>>> >>> > OrangeFS Support and Development
>>> >>> > Omnibond Systems
>>> >>> > Anderson, South Carolina
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Becky Ligon
>>> >> OrangeFS Support and Development
>>> >> Omnibond Systems
>>> >> Anderson, South Carolina
>>> >>
>>> >>
>>>
>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to