[lustre-discuss] lustre OSC and system cache

2016-12-12 Thread John Bauer
I'm observing some undesirable caching of OSC data in the system buffers. This is a single node, single process application. There are 2 files of interest, *SCRATCH *and *SCR300*, both are scratch files with stripeCount=4. The system has 128GB of memory. Lustre maxes out at about 59GB of

Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Brett Lee
Proper? Please expand on that. :) To get started, you could run Lustre on just one node. Brett -- Secure your confidential information with PDS2 PDS Software Solutions LLC https://www.TrustPDS.com On Dec 12, 2016 7:08 PM, "Markham Benjamin" wrote: > Hello, > > I just

[lustre-discuss] Simple Lustre question

2016-12-12 Thread Markham Benjamin
Hello, I just have been reading about Lustre and getting into it. I just have a simple question: about how many computer would I need in order to set up a proper simple LustreFS? Thanks, Ben. ___ lustre-discuss mailing list

Re: [lustre-discuss] lustre OSC and system cache

2016-12-12 Thread John Bauer
Andreas The file system has lru_max_age=900. I have been googling around to find out what this controls, but haven't found much. Is there documentation on how the memory management works with Lustre? I wonder what the lru actually means. How is it that 2 files on the same node are

Re: [lustre-discuss] lustre OSC and system cache

2016-12-12 Thread John Bauer
Andreas Realized I forgot the version after I sent the email. From /proc/fs/lustre/version build: 2.5.2-trunk-1.0502.20758.2.7-abuild-RB-5.2UP04_2.5.2@20758-2015-09-01-23:29 I'll get in touch with the admin on the lru_max_age. Thanks John On 12/12/2016 6:59 PM, Dilger, Andreas wrote:

Re: [lustre-discuss] lustre OSC and system cache

2016-12-12 Thread Dilger, Andreas
On Dec 12, 2016, at 15:50, John Bauer wrote: > > I'm observing some undesirable caching of OSC data in the system buffers. > This is a single node, single process application. There are 2 files of > interest, SCRATCH and SCR300, both are scratch files with

Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Markham Benjamin
Ah sorry. What I mean is something I could experiment on without any hardware limitations. I hope that makes sense. But it seems I could just run Lustre on one node. -Ben > On Dec 13, 2016, at 11:26 AM, Brett Lee wrote: > > Proper? Please expand on that. :) > >

Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Brett Lee
Hi Markham, Maybe think of Lustre as a bunch of different components... While you can combine the components (server services and client services) on one node, you can also put each service on a different node, and connect them via a network. To begin, I suggest using just one node. Start all

Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Brett Lee
Good to hear, Ben. This list is pretty helpful, so if you get stuck, check back in. Brett -- Secure your confidential information with PDS2 PDS Software Solutions LLC https://www.TrustPDS.com On Dec 12, 2016 7:49 PM, "Markham Benjamin" wrote: > Hi Brett, > > Thanks for

Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Markham Benjamin
Hi Brett, Thanks for this. This all seems reasonable and understandable. Thanks for your help -Ben > On Dec 13, 2016, at 11:39 AM, Brett Lee wrote: > > Hi Markham, > > Maybe think of Lustre as a bunch of different components... While you can > combine the

Re: [lustre-discuss] LustreError on ZFS volumes

2016-12-12 Thread Crowe, Tom
Hi Jessie, For clarification, it sounds like you are using hardware based RAID-6, and not ZFS raid? Is this correct? Or was the faulty card simply an HBA? At the bottom of the ‘zpool status -v pool_name’ output, you may see paths and/or zfs object ID’s of the damaged/impacted files. This would

Re: [lustre-discuss] LustreError on ZFS volumes

2016-12-12 Thread Jesse Stroik
Thanks for taking the time to respond, Tom, For clarification, it sounds like you are using hardware based RAID-6, and not ZFS raid? Is this correct? Or was the faulty card simply an HBA? You are correct. This particular file system is still using hardware RAID6. At the bottom of the

Re: [lustre-discuss] LustreError on ZFS volumes

2016-12-12 Thread Crowe, Tom
Hi Jessie, In regards to you seeing 370 objects with errors form ‘zpool status’, but having over 400 files with “access issues”, I would suggest running the ‘zpool scrub’ to identify all the ZFS objects in the pool that are reporting permanent errors. It would be very important to have a

[lustre-discuss] problems accessing files as non-root user.

2016-12-12 Thread Phill Harvey-Smith
Hi All, I'm in the final step of upgrading our storage servers to lustre 2.8. The MDS/OSS are running on Centos 7.2 the clients are Ubuntu 12.04, though I also have a virtual machine running on Centos 7.2 as a client, both seem to exhibit the same problem. Our old environment was a 2.4

Re: [lustre-discuss] problems accessing files as non-root user.

2016-12-12 Thread Carlson, Timothy S
Does your new MDS server have all the UIDs of these people in /etc/passwd? Tim -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Phill Harvey-Smith Sent: Monday, December 12, 2016 9:16 AM To: lustre-discuss@lists.lustre.org Subject:

Re: [lustre-discuss] problems accessing files as non-root user.

2016-12-12 Thread Patrick Farrell
Perhaps more expansively: Is the new MDS configured to be able to authenticate these users? Using /etc/passwd synchronization to do network auth is nasty. It's just asking for weird troubles if you don't get it exactly right. LDAP or similar is the way to go. - Patrick

Re: [lustre-discuss] problems accessing files as non-root user.

2016-12-12 Thread Dilger, Andreas
It may be your identity upcall on the MDS needs to be redirected? Run the following commands on the MDS to verify e.g. UID=1000, assuming your identity upcall is at /usr/sbin/l_getidentity: mds# ls -l $(lctl get_param -n mdt.*.identity_upcall) mds# L_GETIDENTITY_TEST=true

[lustre-discuss] LustreError on ZFS volumes

2016-12-12 Thread Jesse Stroik
One of our lustre file systems still running lustre 2.5.3 and zfs 0.6.3 experienced corruption due to a bad RAID controller. The OST in question was a RAID6 volume which we've marked inactive. Most of our lustre clients are 2.8.0. zfs status reports corruption and checksum errors. I have not