Re: [lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread John Bauer
David You note that you write a 6GB file. I suspect that your Linux systems have significantly more memory than 6GB meaning your file will end being cached in the system buffers. It wont matter how many OSTs you use as you probably are not measuring the speed to the OST's, but rather, you

Re: [lustre-discuss] more on lustre striping

2016-06-10 Thread John Bauer
`fopen' [GLIBC_2.2.5] *$* On 6/10/2016 7:29 AM, Ashley Pittman wrote: On 22/05/16 02:56, John Bauer wrote: Oleg I can intercept the fopen(), but that does me no good as I can't set the O_LOV_DELAY_CREATE bit. What I can not intercept is the open() downstream of fopen(). If one exami

Re: [lustre-discuss] llapi_file_open() messages to stderr

2016-04-30 Thread John Bauer
ohn On 4/30/2016 12:27 PM, Dennis Nelson wrote: Check for existing files before making the call? And don't issue the call if the file exists. You cannot change stripe attributes on an existing file. Sent from my iPhone On Apr 30, 2016, at 11:51 AM, John Bauer <bau...@iodoctors.com

[lustre-discuss] llapi_file_get_stripe() and lmm_stripe_offset

2016-04-30 Thread John Bauer
I have noticed some inconsistencies in the *lfs setstripe/getstripe *commands and *llapi_file_get_stripe(*) function. Notice in the lfs setstripe/getstripe example below that specifying 7 for the offset with lfs setstripe -i 7 results in lfs getstripe reporting lmm_stripe_offset=7. That all

[lustre-discuss] more on lustre striping

2016-05-18 Thread John Bauer
Since today's topic seems to be Lustre striping, I will revisit a previous line of questions I had. Andreas had put me on to O_LOV_DELAY_CREATE which I have been experimenting with. My question is : Is there a way to flag a directory with O_LOV_DELAY_CREATE so that a file created in that

Re: [lustre-discuss] more on lustre striping

2016-05-21 Thread John Bauer
. On the other hand if you open with O_LOV_DELAY_CREATE and then try to write into that fd - you will get a failure. On May 21, 2016, at 4:01 PM, John Bauer wrote: Andreas, Thanks for the reply. For what it's worth, extending a file that does not have layout set does work. % rm -f file.dat

Re: [lustre-discuss] more on lustre striping

2016-05-21 Thread John Bauer
yscalls. I wonder if you can intercept something deeper like sys_open or something like that? Perhaps checkout lustre 1.8 sources (or even 2.1) and see how we did it back there? On May 21, 2016, at 4:25 PM, John Bauer wrote: Oleg So in my simple test, the second open of the file caused

[lustre-discuss] Delaying Lustre striping until first extent

2016-05-03 Thread John Bauer
Has any there been any discussion as to allowing a user to modify the striping of a file until the first extent is made? There are a lot of opens that can not be easily replaced with *llapi_file_open*(), such as *openat*() family, *mkstemp*() family, and *fopen*() family. It seems that it

[lustre-discuss] fiemap problems

2016-07-22 Thread John Bauer
I am experiencing an intermittent problem with fiemap on Lustre. This is running on pleiades NASA Ames. lustre: 2.7.1 kernel: 3.0.101-68.1.20160209-nasa build: 2.7.1-3nasC_mofed31v5 I create a file with *dd if=/dev/zero of=${FILE} count=100 bs=1M* and then run my program to do the fcntl call

[lustre-discuss] llapi_file_get_stripe() and /proc/fs/lustre/osc restated

2016-07-18 Thread John Bauer
I will restate the problem I am having with Lustre. With my I/O instrumentation library, I want to use *llapi_file_get_stripe*() to find the OSTs that a file of interest is striped on and then monitor only those OST's using files in the directory /proc/fs/lustre/osc. This needs to be done

Re: [lustre-discuss] llapi_file_get_stripe() and /proc/fs/lustre/osc restated

2016-07-18 Thread John Bauer
used for that file on that particular mountpoint. Note, like the client ID, the OSC name is specific to that mount point. The actual OST name is just FSNAME-OST. robert On Jul 18, 2016, at 15:22 , John Bauer <bau...@iodoctors.com <mailto:bau...@iodoctors.com>> wrote: I will rest

[lustre-discuss] llapi_file_get_stripe() and /proc/fs/lustre/osc/ entries

2016-07-16 Thread John Bauer
I am using *llapi_file_get_stripe()* to get the ost indexes that a file is striped on. That part is working fine. But there are multiple Lustre file systems on the node resulting in multiple **OST* *in the directory /proc/fs/lustre/osc. Is there something in the *struct

Re: [lustre-discuss] Lustre striping and MPI

2016-10-28 Thread John Bauer
tre Principal Architect Intel High Performance Data Division On 2016/10/26, 06:51, " John Bauer" <bau...@iodoctors.com <mailto:bau...@iodoctors.com>> wrote: All I am running a4 rank MPI job where all the ranks do an open of the file, attempt to set the striping with io

Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread John Bauer
Patrick I thought at one time there was an inode lock held for the duration of the direct I/O read or write. So that even if one had multiple application threads writing direct, only one was "in flight" at a time. Has that changed? John Sent from my iPhone > On Oct 14, 2016, at 3:16 PM,

[lustre-discuss] Lustre striping and MPI

2016-10-26 Thread John Bauer
All I am running a4 rank MPI job where all the ranks do an open of the file, attempt to set the striping with ioctl() and then do a small write. Intermittently, I get errors on the write() and ioctl(). This is a synthetic test case, boiled down from a much larger real world job. Note that

[lustre-discuss] lustre OSC and system cache

2016-12-12 Thread John Bauer
I'm observing some undesirable caching of OSC data in the system buffers. This is a single node, single process application. There are 2 files of interest, *SCRATCH *and *SCR300*, both are scratch files with stripeCount=4. The system has 128GB of memory. Lustre maxes out at about 59GB of

Re: [lustre-discuss] lustre OSC and system cache

2016-12-12 Thread John Bauer
are not controlled by the same lru mechanism, as SCR300's pages are being lru'ed out when they are clearly used more recently than any in SCRATCH? Thanks John On 12/12/2016 6:59 PM, Dilger, Andreas wrote: On Dec 12, 2016, at 15:50, John Bauer <bau...@iodoctors.com> wrote: I'm observin

Re: [lustre-discuss] lustre OSC and system cache

2016-12-12 Thread John Bauer
: On Dec 12, 2016, at 15:50, John Bauer <bau...@iodoctors.com> wrote: I'm observing some undesirable caching of OSC data in the system buffers. This is a single node, single process application. There are 2 files of interest, SCRATCH and SCR300, both are scratch files with stripeC

Re: [lustre-discuss] question about /proc/fs/lustre/osc/ and llapi functions.

2017-04-12 Thread John Bauer
Andreas First, thanks for the response. Second, I looked at that at one time, seeming the logical answer, but I must have misread/mistyped something. My apologies. Thanks gain John On 4/12/2017 3:34 AM, Dilger, Andreas wrote: On Apr 7, 2017, at 18:06, John Bauer <bau...@iodoctors.

[lustre-discuss] question about /proc/fs/lustre/osc/ and llapi functions.

2017-04-07 Thread John Bauer
In /proc/fs/lustre/osc/ is an entry for every osc of all the lustre files systems on a client node of the form *nbp9-OST0124-osc-880ffaa4bc00 *My I/O instrumentation library tracks OSC's associated with an application file by reading the files in the directory for each OSC the

[lustre-discuss] sudden read performance drop on sequential forward read.

2017-08-31 Thread John Bauer
All, I have an application that writes a 100GB file forwards, and then begins a sequence of reading a 70 GB section of the file forwards and backwards. At some point in the run, not always at the same point, the read performance degrades significantly.  The initial forward reads are about 1.3

Re: [lustre-discuss] varying sequential read performance.

2018-04-05 Thread John Bauer
when others increase. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu On Apr 2, 2018, at 8:06 PM, John Bauer <bau...@iodoctors.com> wrote: I am running dd 10 times consecutively to read a 64GB file ( stripeC

[lustre-discuss] varying sequential read performance.

2018-04-02 Thread John Bauer
I am running dd 10 times consecutively to  read a 64GB file ( stripeCount=4 stripeSize=4M ) on a Lustre client(version 2.10.3) that has 64GB of memory. The client node was dedicated. *for pass in 1 2 3 4 5 6 7 8 9 10 do    of=/dev/null if=${file} count=128000 bs=512K done * Instrumentation of

Re: [lustre-discuss] varying sequential read performance.

2018-04-03 Thread John Bauer
with the significant advantage that runs 4-10 have, you could never tell in the dd results.  Run 5 is slightly faster than run 2, and run 7 is as slow as run 0. John On 4/3/2018 12:20 AM, Colin Faber wrote: Are you flushing cache between test runs? On Mon, Apr 2, 2018, 6:06 PM John Bauer <

Re: [lustre-discuss] lustre-discuss Digest, Vol 191, Issue 2

2022-02-03 Thread John Bauer
 The following loop in wdfile.f90 is pointless as the write happens only once for each rank. Each rank is writing out the array once and then closing the file.  If the size of array 'data' is not a multiple of the Lustre stripe size there is going to be a lot of read-modify-write going on.  

[lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-19 Thread John Bauer
When using PFL, and using an SSD as the first extent, it seems it would be advantageous to not have that extent's file data consume memory in the client's system buffers.  It would be similar to using O_DIRECT, but on a per-extent basis.  Is there a mechanism for that already? Thanks, John

Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-19 Thread John Bauer
. -Patrick *From:* lustre-discuss on behalf of John Bauer *Sent:* Thursday, May 19, 2022 12:48 PM *To:* lustre-discuss@lists.lustre.org *Subject:* [lustre-discuss] Avoiding system cache when using ssd pfl extent When using PFL

Re: [lustre-discuss] llapi documentation

2022-06-15 Thread John Bauer
John On 6/15/22 02:08, Andreas Dilger wrote: On Jun 14, 2022, at 05:32, John Bauer wrote: I have had little success in my search for documentation on pool functions in llapi. I've looked in: https://wiki.lustre.org/PFL2_High_Level_Design https://doc.lustre.org/lustre_manual.xhtml

Re: [lustre-discuss] llapi documentation

2022-06-15 Thread John Bauer
specific than "Re: Contents of lustre-discuss digest..." Today's Topics: 1. Re: llapi documentation (Andreas Dilger) 2. Re: llapi documentation (John Bauer) -- Message: 1 Date: Wed, 15 Jun 2022 07:08:58 + Fro

[lustre-discuss] overstriping

2022-06-15 Thread John Bauer
Just a note that the man page for lfs setstripe seems a bit misleading for the -C --overstripe-count option.  The man page states "creating > 1 stripe per OST if count exceeds the number of OSTs in  the file system".  It appears that lustre creates more than one stripe on an OST if the number

[lustre-discuss] llapi documentation

2022-06-14 Thread John Bauer
I have had little success in my search for documentation on pool functions in llapi. I've looked in: https://wiki.lustre.org/PFL2_High_Level_Design https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace I'm looking for info on llapi_get_poollist() and llapi_get_poolmembers(). 

[lustre-discuss] llapi_layout_alloc

2022-06-06 Thread John Bauer
Is there any reason that a call to *llapi_layout_alloc()* would result in the following error messge? */nobackupp17/jbauer2/dd.out has no stripe info* In the code snippet below, I get the first print "on entry", followed by the above error message, but I don't get the print after the call to

[lustre-discuss] lfs getstripe -d

2022-07-18 Thread John Bauer
Hi all, Looking in the documentation at https://doc.lustre.org/lustre_manual.xhtml it appears that lfs getstripe should/does echo the directory that is being queried. $ lfs getstripe -d /mnt/testfs/pfldir /mnt/testfs/pfldir stripe_count:  1 stripe_size:   1048576 stripe_offset: -1

[lustre-discuss] llapi_layout_file_comp_del

2022-07-26 Thread John Bauer
Hi all, I would like to use the llapi_layout_file_comp_del() function.  I have found 2 prototypes in different places.  One has the 3rd argument, uint32_t flags, and the other doesn't.  I suspect the High Level Design document is incorrect.  The one line of documentation in lustreapi.h

Re: [lustre-discuss] llapi_layout_file_comp_del

2022-07-28 Thread John Bauer
components that match the argument (e.g. LCME_FL_NEG|LCME_FL_INIT), but I haven't tested that. On Jul 26, 2022, at 14:35, John Bauer wrote: Hi all, I would like to use the llapi_layout_file_comp_del() function.  I have found 2 prototypes in different places.  One has the 3rd argument, uint32_t

[lustre-discuss] darshan-discuss

2022-04-28 Thread John Bauer
Since there seems to be considerable overlap between lustre and darshan users I thought I would ask here:  Is there an email list for darshan discussion analogous to lustre-discuss? Thanks, John ___ lustre-discuss mailing list

Re: [lustre-discuss] llapi_layout_file_comp_del

2022-08-01 Thread John Bauer
tly less efficient than the new API (two opens and closes), but would allow you to migrate to the new (more efficient) implementation easily in the future. Cheers, Andreas On Jul 28, 2022, at 14:03, John Bauer wrote: Andreas, Thanks for the info.  A related question: I am using the O_LOV_DELAY_CRE

[lustre-discuss] fiemap

2022-08-18 Thread John Bauer
Hi all, I am trying to get my llfie program (which uses fiemap) going again, but now the struct fiemap_extent structures I get back from the ioctl call, all have fe_device=0.  The output from lfs getstripe indicates that the devices are not all 0.  The sum of the fe_length members adds up to

Re: [lustre-discuss] fiemap, final chapter.

2022-08-19 Thread John Bauer
eaders. You could try adding "__attribute__((packed))" at the end of the struct definition to see if that fixes the problem. Cheers, Andreas On Aug 18, 2022, at 21:54, John Bauer wrote: Andreas, This is no longer Lustre related, but I hope you can shed some light on this.  It appea

Re: [lustre-discuss] fiemap

2022-08-18 Thread John Bauer
other "less compatible" changes that would have been needed to implement PFL/FLR handling. That said, I would have expected this change to result in your tool reporting very large values for fe_device (e.g. OST index + N * 65536), so returning all-zero values is somewhat unexpected.

Re: [lustre-discuss] fiemap

2022-08-18 Thread John Bauer
"listExtents() fe_physical=493006"...,72) = 72 write(2,"listExtents() fe_physical=172833"...,73) = 73 On 8/18/22 16:11, Andreas Dilger wrote: On Aug 18, 2022, at 14:28, John Bauer wrote: Andreas, Thanks for the reply.  I don't think I'm accessing the Lustre filefrag ( see b

[lustre-discuss] fio and lustre performance

2022-08-25 Thread John Bauer
Hi all, I'm trying to figure out an odd behavior when running an fio ( https://git.kernel.dk/cgit/fio/ ) benchmark on a Lustre file system. fio--randrepeat=1 \ --ioengine=posixaio \ --buffered=1 \ --gtod_reduce=1 \ --name=test \

Re: [lustre-discuss] liblustreapi.so llapi_layout_get_by_fd() taking a long time to complete

2022-11-25 Thread John Bauer
rf_1.png?dl=0 https://www.dropbox.com/s/tebm1iy0jipnqgx/h5perf_2.png?dl=0 https://www.dropbox.com/s/ydgzstkg9qrk6z4/h5perf_3.png?dl=0 On 11/24/22 20:47, Andreas Dilger wrote: On Nov 22, 2022, at 13:57, John Bauer wrote: Hi all, I am making a call to *llapi_layout_get_by_fd()*  from each rank of a 1

[lustre-discuss] liblustreapi.so llapi_layout_get_by_fd() taking a long time to complete

2022-11-22 Thread John Bauer
Hi all, I am making a call to *llapi_layout_get_by_fd()*  from each rank of a 16 rank MPI job.  One rank per node. About 75% of the time, one of the ranks, typically rank 0, takes a very long time to complete this call.  I have placed fprintf() calls with wall clock timers around the call. 

[lustre-discuss] LUG23 speakers/presentations

2023-03-24 Thread John Bauer
Is the speaker/presentation itinerary for LUG23 going to be posted before the early registration date passes? ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] lfs setstripe with stripe_count=0

2023-02-21 Thread John Bauer
Something doesn't make sense to me when using lfs setstripe when specifying 0 for the stripe_count .  This first command works as expected.  The pool is the one specified, 2_hdd, and the -c 0 results in a stripe_count of 1 which I believe is the default for the file-system default ( per the

[lustre-discuss] Lustre client caching question

2023-08-14 Thread John Bauer
I have an application that reads a 70GB section of a file forwards and backwards multiple ( 14 ) times.  This is on a 64 GB system.  Monitoring /proc/meminfo  shows that the memory consumed by file cache bounces around the 32GB value.  The forward reads go at about 3.3GB/s.  What is

Re: [lustre-discuss] Lustre caching and NUMA nodes

2023-12-05 Thread John Bauer
and also a link to the image on DropBox. Thanks again, John https://www.dropbox.com/scl/fi/fgmz4wazr6it9q2aeo0mb/write_RPCs_in_flight.png?rlkey=d3ri2w2n7isggvn05se4j3a6b=0 On 12/5/23 22:33, Andreas Dilger wrote: On Dec 4, 2023, at 15:06, John Bauer wrote: I have a an OSC caching question.  I

Re: [lustre-discuss] lustre-discuss Digest, Vol 213, Issue 7

2023-12-06 Thread John Bauer
Peter, I've been reading the Baeldung pages, among others, to gain some insight on Linux buffer cache behavior. https://www.baeldung.com/linux/file-system-caching andhttps://docs.kernel.org/admin-guide/sysctl/vm.html As can been seen in the first image below, Lustre is having no trouble

[lustre-discuss] Lustre caching and NUMA nodes

2023-12-07 Thread John Bauer
Peter, A delayed reply to one more of your questions, "What makes you think "lustre" is doing that?" , as I had to make another run and gather OSC stats on all the Lustre file systems mounted on the host that I run dd on. This host has 12 Lustre file systems, comprised of 507 OSTs. While dd

Re: [lustre-discuss] lustre-discuss Digest, Vol 213, Issue 10

2023-12-07 Thread John Bauer
reach the person managing the list at lustre-discuss-ow...@lists.lustre.org When replying, please edit your Subject line so it is more specific than "Re: Contents of lustre-discuss digest..." Today's Topics: 1. Lustre caching and NUMA nodes (