Re: [lustre-discuss] Lustre 2.9 performance issues

2017-04-27 Thread Jeff Johnson
While tuning can alleviate some pain it shouldn't go without mentioning
that there are some operations that are just less than optimal on a
parallel file system. I'd bet a cold one that a copy to local /tmp,
vim/paste, copy back to the LFS would've been quicker. Some single-threaded
small i/o operations can be approached more efficiently in a similar
manner.

Lustre is a fantastic tool and like most tools it doesn't do everything
well..*yet*

--Jeff

On Thu, Apr 27, 2017 at 4:21 PM, Dilger, Andreas 
wrote:

> On Apr 25, 2017, at 13:11, Bass, Ned  wrote:
> >
> > Hi Darby,
> >
> >> -Original Message-
> >>
> >> for i in $(seq 0 99) ; do
> >>   dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1
> >> done
> >>
> >> The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges
> from 20
> >> to 60 sec on our newer 2.9 LFS.
> >
> > Because Lustre does not yet use the ZFS Intent Log (ZIL), it implements
> fsync() by
> > waiting for an entire transaction group to get written out. This can
> incur long
> > delays on a busy filesystem as the transaction groups become quite
> large. Work
> > on implementing ZIL support is being tracked in LU-4009 but this feature
> is not
> > expected to make it into the upcoming 2.10 release.
>
> There is also the patch that was developed in the past to test this:
> https://review.whamcloud.com/7761 "LU-4009 osd-zfs: Add tunables to
> disable sync"
> which allows disabling ZFS to wait for TXG commit for each sync on the
> servers.
>
> That may be an acceptable workaround in the meantime.  Essentially,
> clients would
> _start_ a sync on the server, but would not wait for completion before
> returning
> to the application.  Both the client and the OSS would need to crash
> within a few
> seconds of the sync for it to be lost.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.9 performance issues

2017-04-27 Thread Dilger, Andreas
On Apr 25, 2017, at 13:11, Bass, Ned  wrote:
> 
> Hi Darby,
> 
>> -Original Message-
>> 
>> for i in $(seq 0 99) ; do
>>   dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1
>> done
>> 
>> The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges from 20
>> to 60 sec on our newer 2.9 LFS.  
> 
> Because Lustre does not yet use the ZFS Intent Log (ZIL), it implements 
> fsync() by
> waiting for an entire transaction group to get written out. This can incur 
> long
> delays on a busy filesystem as the transaction groups become quite large. Work
> on implementing ZIL support is being tracked in LU-4009 but this feature is 
> not
> expected to make it into the upcoming 2.10 release.

There is also the patch that was developed in the past to test this:
https://review.whamcloud.com/7761 "LU-4009 osd-zfs: Add tunables to disable 
sync"
which allows disabling ZFS to wait for TXG commit for each sync on the servers.

That may be an acceptable workaround in the meantime.  Essentially, clients 
would
_start_ a sync on the server, but would not wait for completion before returning
to the application.  Both the client and the OSS would need to crash within a 
few
seconds of the sync for it to be lost.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] undelete

2017-04-27 Thread E.S. Rosenberg
A user just rm'd a big archive of theirs on lustre, any way to recover it
before it gets destroyed by other writes?
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] kerberised lustre performance

2017-04-27 Thread E.S. Rosenberg
Hi everyone,
I just saw Sebatians' talk at LUG 2016 (Yes I know I'm a bit behind times)
and I was wondering if and how much a performance impact there is from the
need to get kerberos tickets before file actions (or is it only
mounting?)...

https://www.youtube.com/watch?v=zo6b03zxrIs

Thanks,
Eli
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org