Hi,

On 2020-05-01 16:32:15 -0400, Robert Haas wrote:
> On Thu, Apr 30, 2020 at 6:06 PM Robert Haas <robertmh...@gmail.com> wrote:
> > On Thu, Apr 30, 2020 at 3:52 PM Andres Freund <and...@anarazel.de> wrote:
> > > Why 8kb? That's smaller than what we currently do in pg_basebackup,
> > > afaictl, and you're actually going to be bottlenecked by syscall
> > > overhead at that point (unless you disable / don't have the whole intel
> > > security mitigation stuff).
> >
> > I just picked something. Could easily try other things.
> 
> I tried changing the write size to 64kB, keeping the rest the same.
> Here are the results:
> 
> filesystem media 1@64GB 2@32GB 4@16GB 8@8GB 16@4GB
> xfs mag 65 53 64 74 79
> ext4 mag 96 68 75 303 437
> xfs ssd 75 43 29 33 38
> ext4 ssd 96 68 63 214 254
> spread spread n/a n/a 43 38 40
> 
> And here again are the previous results with an 8kB write size:
> 
> xfs mag 97 53 60 67 71
> ext4 mag 94 68 66 335 549
> xfs ssd 97 55 33 27 25
> ext4 ssd 116 70 66 227 450
> spread spread n/a n/a 48 42 44
> 
> Generally, those numbers look better than the previous numbers, but
> parallelism still looks fairly appealing on the SSD storage - less so
> on magnetic disks, at least in this test.

I spent a fair bit of time analyzing this, and my conclusion is that you
might largely be seeing numa effects. Yay.

I don't have an as large numa machine at hand, but here's what I'm
seeing on my local machine, during a run of writing out 400GiB (this is
a run with noise on the machine, the benchmarks below are without
that). The machine has 192GiB of ram, evenly distributed to two sockets
/ numa domains.


At start I see
numastat -m|grep -E 
'MemFree|MemUsed|Dirty|Writeback|Active\(file\)|Inactive\(file\)'"
MemFree                 91908.20        92209.85       184118.05
MemUsed                  3463.05         4553.33         8016.38
Active(file)              105.46          328.52          433.98
Inactive(file)             68.29          190.14          258.43
Dirty                       0.86            0.90            1.76
Writeback                   0.00            0.00            0.00
WritebackTmp                0.00            0.00            0.00

For a while there's pretty decent IO throughput (all 10s samples):
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1955.67   
2299.32     0.00   0.00   42.48  1203.94    0.00      0.00     0.00   0.00    
0.00     0.00    0.00    0.00   82.10  89.33

Then it starts to be slower on a sustained basis:
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1593.33   
1987.85     0.00   0.00   42.90  1277.55    0.00      0.00     0.00   0.00    
0.00     0.00    0.00    0.00   67.55  76.53

And then performance tanks completely:
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00  646.33    
781.85     0.00   0.00  132.68  1238.70    0.00      0.00     0.00   0.00    
0.00     0.00    0.00    0.00   85.43  58.63


That amount of degradation confused me for a while, especially because I
couldn't reproduce it the more controlled I made the setups. In
particular I stopped seeing the same magnitude of issues after pinnning
processes to one numa socket (both running and memory).

After a few seconds:
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1882.00   
2320.07     0.00   0.00   42.50  1262.35    0.00      0.00     0.00   0.00    
0.00     0.00    0.00    0.00   79.05  88.07

MemFree                 35356.50        80986.46       116342.96
MemUsed                 60014.75        15776.72        75791.47
Active(file)              179.44          163.28          342.72
Inactive(file)          58293.18        13385.15        71678.33
Dirty                   18407.50          882.00        19289.50
Writeback                 235.78          335.43          571.21
WritebackTmp                0.00            0.00            0.00

A bit later io starts to get slower:

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1556.30   
1898.70     0.00   0.00   40.92  1249.29    0.00      0.00     0.00   0.00    
0.00     0.00    0.20   24.00   62.90  72.01

MemFree                   519.56        36086.14        36605.70
MemUsed                 94851.69        60677.04       155528.73
Active(file)              303.84          212.96          516.80
Inactive(file)          92776.70        58133.28       150909.97
Dirty                   10913.20         5374.07        16287.27
Writeback                 812.94          331.96         1144.90
WritebackTmp                0.00            0.00            0.00


And then later it gets worse:
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1384.70   
1671.25     0.00   0.00   40.87  1235.91    0.00      0.00     0.00   0.00    
0.00     0.00    0.20    7.00   55.89  63.45

MemFree                   519.54          242.98          762.52
MemUsed                 94851.71        96520.20       191371.91
Active(file)              175.82          246.03          421.85
Inactive(file)          92820.19        93985.79       186805.98
Dirty                   10482.75         4140.72        14623.47
Writeback                   0.00            0.00            0.00
WritebackTmp                0.00            0.00            0.00

When using a 1s iostat instead of a 10s, it's noticable that performance
swings widely between very slow (<100MB/s) and very high throughput (>
2500MB/s).

It's clearly visible that performance degrades substantially first when
all of a numa node's free memory is exhausted, then when the second numa
node's is.

Looking at profile I see a lot of cacheline bouncing between the kernel
threads that "reclaim" pages (i.e. make them available for reuse), the
kernel threads that write out dirty pages, the kernel threads where the
IO completes (i.e. where the dirty bit can be flipped / locks get
released), and the writing process.

I think there's a lot from the kernel side that can improve - but it's
not too surprising that letting the kernel cache / forcing it to make
caching decisions for a large streaming wide has substantial costs.


I changed Robert's test program to optionall fallocate,
sync_file_range(WRITE), posix_fadvise(DONTNEED), to avoid a large
footprint in the page cache. The performance
differences are quite substantial:

gcc -Wall -ggdb ~/tmp/write_and_fsync.c -o /tmp/write_and_fsync && \
    rm -ff /srv/dev/bench/test* && echo 3 |sudo tee /proc/sys/vm/drop_caches && 
\
    /tmp/write_and_fsync --sync_file_range=0 --fallocate=0 --fadvise=0 
--filesize=$((400*1024*1024*1024)) /srv/dev/bench/test1

running test with: numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 
sfr=0 fadvise=0
[/srv/dev/bench/test1][11450] open: 0, fallocate: 0 write: 214, fsync: 6, 
close: 0, total: 220

comparing that with --sync_file_range=1 --fallocate=1 --fadvise=1
running test with: numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 
sfr=1 fadvise=1
[/srv/dev/bench/test1][14098] open: 0, fallocate: 0 write: 161, fsync: 0, 
close: 0, total: 161

Below are the results of running a the program with a variation of
parameters (both file and resutls attached).

I used perf stat in this run to measure the difference in CPU
usage.

ref_cycles are the number of CPU cycles, across all 20 cores / 40
threads, CPUs were doing *something*. It is not affected by CPU
frequency scaling, just by the time CPUs were not "halted". Whereas
cycles is affected by frequency scaling.

A high ref_cycles_sec, combined with a decent number of total
instructions/cycles is *good*, because it indicates fewer CPUs
used. Whereas a very high ref_cycles_tot means that more CPUs were
running doing something for the duration of the benchmark.

The run-to-run variations between the runs without cache control are
pretty large. So this is probably not the end-all-be-all numbers. But I
think the trends are pretty clear.

test                                                                            
time            ref_cycles_tot        ref_cycles_sec  cycles_tot           
cycles_sec      instructions_tot      ipc
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 sfr=1 fadvise=0     
248.430736196   1,497,048,950,014     150.653M/sec    1,226,822,167,960    
0.123GHz        705,950,461,166      0.54
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 sfr=0 fadvise=1     
310.275952938   1,921,817,571,226     154.849M/sec    1,499,581,687,133    
0.121GHz        944,243,167,053      0.59
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 sfr=1 fadvise=1     
164.175492485   913,991,290,231       139.183M/sec    762,359,320,428      
0.116GHz        678,451,556,273      0.84
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=0 fadvise=0     
243.609959554   1,802,385,405,203     184.970M/sec    1,449,560,513,247    
0.149GHz        855,426,288,031      0.56
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=1 fadvise=0     
230.880100449   1,328,417,418,799     143.846M/sec    1,148,924,667,393    
0.124GHz        723,158,246,628      0.63
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=0 fadvise=1     
253.591234992   1,548,485,571,798     152.658M/sec    1,229,926,994,613    
0.121GHz        1,117,352,436,324    0.95
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=1 fadvise=1     
164.488835158   911,974,902,254       138.611M/sec    760,756,011,483      
0.116GHz        672,105,046,261      0.84
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=0 fadvise=0     
164.052510134   1,561,521,537,336     237.972M/sec    1,404,761,167,120    
0.214GHz        715,274,337,015      0.51
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=1 fadvise=0     
192.151682414   1,526,440,715,456     198.603M/sec    1,037,135,756,007    
0.135GHz        802,754,964,096      0.76
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=0 fadvise=1     
242.648245159   1,782,637,416,163     183.629M/sec    1,463,696,313,881    
0.151GHz        1,000,100,694,932    0.69
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=1 fadvise=1     
188.772193248   1,418,274,870,697     187.803M/sec    923,133,958,500      
0.122GHz        799,212,291,243      0.92
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=0 fadvise=0     
421.580487642   2,756,486,952,728     163.449M/sec    1,387,708,033,752    
0.082GHz        990,478,650,874      0.72
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=1 fadvise=0     
169.854206542   1,333,619,626,854     196.282M/sec    1,036,261,531,134    
0.153GHz        666,052,333,591      0.64
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=0 fadvise=1     
305.078100578   1,970,042,289,192     161.445M/sec    1,505,706,462,812    
0.123GHz        954,963,240,648      0.62
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=1 fadvise=1     
166.295223626   1,290,699,256,763     194.044M/sec    857,873,391,283      
0.129GHz        761,338,026,415      0.89
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=0 fadvise=0     
455.096916715   2,808,715,616,077     154.293M/sec    1,366,660,063,053    
0.075GHz        888,512,073,477      0.66
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=1 fadvise=0     
256.156100686   2,407,922,637,215     235.003M/sec    1,133,311,037,956    
0.111GHz        748,666,206,805      0.65
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=0 fadvise=1     
215.255015340   1,977,578,120,924     229.676M/sec    1,461,504,758,029    
0.170GHz        1,005,270,838,642    0.68
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=1 fadvise=1     
158.262790654   1,720,443,307,097     271.769M/sec    1,004,079,045,479    
0.159GHz        826,905,592,751      0.84
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=0 fadvise=0     
334.932246893   2,366,388,662,460     176.628M/sec    1,216,049,589,993    
0.091GHz        796,698,831,717      0.68
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=1 fadvise=0     
161.697270285   1,866,036,713,483     288.576M/sec    1,068,181,502,433    
0.165GHz        739,559,279,008      0.70
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=0 fadvise=1     
231.440889430   1,965,389,749,057     212.391M/sec    1,407,927,406,358    
0.152GHz        997,199,361,968      0.72
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=1 fadvise=1     
214.433248700   2,232,198,239,769     260.300M/sec    1,073,334,918,389    
0.125GHz        861,540,079,120      0.80
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=0 fadvise=0   
644.521613661   3,688,449,404,537     143.079M/sec    2,020,128,131,309    
0.078GHz        961,486,630,359      0.48
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=1 fadvise=0   
243.830464632   1,499,608,983,445     153.756M/sec    1,227,468,439,403    
0.126GHz        691,534,661,654      0.59
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=0 fadvise=1   
292.866419420   1,753,376,415,877     149.677M/sec    1,483,169,463,392    
0.127GHz        860,035,914,148      0.56
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=1 fadvise=1   
162.152397194   925,643,754,128       142.719M/sec    743,208,501,601      
0.115GHz        554,462,585,110      0.70
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=0 fadvise=0   
211.369510165   1,558,996,898,599     184.401M/sec    1,359,343,408,200    
0.161GHz        766,769,036,524      0.57
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=1 fadvise=0   
233.315094908   1,427,133,080,540     152.927M/sec    1,166,000,868,597    
0.125GHz        743,027,329,074      0.64
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=0 fadvise=1   
290.698155820   1,732,849,079,701     149.032M/sec    1,441,508,612,326    
0.124GHz        835,039,426,282      0.57
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=1 fadvise=1   
159.945462440   850,162,390,626       132.892M/sec    724,286,281,548      
0.113GHz        670,069,573,150      0.90
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=0 fadvise=0   
163.244592275   1,524,807,507,173     233.531M/sec    1,398,319,581,978    
0.214GHz        689,514,058,243      0.46
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=1 fadvise=0   
231.795934322   1,731,030,267,153     186.686M/sec    1,124,935,745,020    
0.121GHz        736,084,922,669      0.70
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=0 fadvise=1   
315.564163702   1,958,199,733,216     155.128M/sec    1,405,115,546,716    
0.111GHz        1,000,595,890,394    0.73
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=1 fadvise=1   
210.945487961   1,527,169,148,899     180.990M/sec    906,023,518,692      
0.107GHz        700,166,552,207      0.80
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=0 fadvise=0   
161.759094088   1,468,321,054,671     226.934M/sec    1,221,167,105,510    
0.189GHz        735,855,415,612      0.59
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=1 fadvise=0   
158.578248952   1,354,770,825,277     213.586M/sec    936,436,363,752      
0.148GHz        654,823,079,884      0.68
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=0 fadvise=1   
274.628500801   1,792,841,068,080     163.209M/sec    1,343,398,055,199    
0.122GHz        996,073,874,051      0.73
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=1 fadvise=1   
179.140070123   1,383,595,004,328     193.095M/sec    850,299,722,091      
0.119GHz        706,959,617,654      0.83
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=0 fadvise=0   
445.496787199   2,663,914,572,687     149.495M/sec    1,267,340,496,930    
0.071GHz        787,469,552,454      0.62
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=1 fadvise=0   
261.866083604   2,325,884,820,091     222.043M/sec    1,094,814,208,219    
0.105GHz        649,479,233,453      0.57
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=0 fadvise=1   
172.963505544   1,717,387,683,260     248.228M/sec    1,356,381,335,831    
0.196GHz        822,256,638,370      0.58
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=1 fadvise=1   
157.934678897   1,650,503,807,778     261.266M/sec    970,705,561,971      
0.154GHz        637,953,927,131      0.66
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=0 fadvise=0   
225.623143601   1,804,402,820,599     199.938M/sec    1,086,394,788,362    
0.120GHz        656,392,112,807      0.62
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=1 fadvise=0   
157.930900998   1,797,506,082,342     284.548M/sec    1,001,509,813,741    
0.159GHz        644,107,150,289      0.66
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=0 fadvise=1   
165.772265335   1,805,895,001,689     272.353M/sec    1,514,173,918,970    
0.228GHz        823,435,044,810      0.54
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=1 fadvise=1   
187.664764448   1,964,118,348,429     261.660M/sec    978,060,510,880      
0.130GHz        668,316,194,988      0.67


Greetings,

Andres Freund
#define _GNU_SOURCE         /* See feature_test_macros(7) */

#include <fcntl.h>
#include <fcntl.h>
#include <getopt.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#include <stdbool.h>

#define SFR_START_WRITE_DELAY (1 * 1024 * 1024)
#define SFR_WAIT_WRITE_DELAY (8 * 1024 * 1024)
#define SFR_START_SIZE (512 * 1024)
#define SFR_WAIT_SIZE (8 * 1024 * 1024)

#define FADVISE_DONTNEED_SIZE (512 * 1024)
#define FADVISE_DONTNEED_DELAY (SFR_WAIT_WRITE_DELAY + FADVISE_DONTNEED_SIZE)

typedef struct runparams
{
	uint32_t blocksize;
	uint64_t filesize;
	int numprocs;
	int numfiles;
	char **filenames;
	bool fallocate;
	bool fadvise;
	bool sync_file_range;
	bool sequential;
} runparams;

extern void runtest(const runparams *params, char *filename);

const struct option getopt_options[] = {
	{.name = "filesize", .has_arg = required_argument,  .val = 's'},
	{.name = "blocksize", .has_arg = required_argument, .val = 'b'},
	{.name = "fallocate", .has_arg = required_argument, .val = 'a'},
	{.name = "fadvise", .has_arg = required_argument, .val = 'f'},
	{.name = "sync_file_range",  .has_arg = required_argument, .val = 'r'},
	{.name = "sequential",  .has_arg = required_argument, .val = 'q'},
	{}};

static void
helpdie(void)
{
	fprintf(stderr, "\n"
			"Usage: write_and_fsync [OPTIONS] [FILES]\n"
			"--filesize=...\n"
			"--blocksize=...\n"
			"--fallocate=yes/no/0/1\n"
			"--fadvise=yes/no/0/1\n"
			"--sync_file_range=yes/no/0/1\n"
			"--sequential=yes/no/0/1\n");
	exit(1);
}

int
main(int argc, char **argv)
{
	runparams params = {
		.blocksize = 8192,
	};
	int	status;

	while (1)
	{
		int o;

		o = getopt_long(argc, argv, "", getopt_options, NULL);

		if (o == -1)
			break;

		switch (o)
		{
			case 0:
				break;
			case 's':
				params.filesize = strtoull(optarg, NULL, 0);
				break;
			case 'b':
				params.blocksize = strtoul(optarg, NULL, 0);
				break;
			case 'a':
				params.fallocate = strcmp(optarg, "yes") == 0 || strcmp(optarg, "1") == 0;
				break;
			case 'f':
				params.fadvise = strcmp(optarg, "yes") == 0 || strcmp(optarg, "1") == 0;
				break;
			case 'r':
				params.sync_file_range = strcmp(optarg, "yes") == 0 || strcmp(optarg, "1") == 0;
				break;
			case 'q':
				params.sequential = strcmp(optarg, "yes") == 0 || strcmp(optarg, "1") == 0;
				break;
			case '?':
				helpdie();
				break;
			default:
				fprintf(stderr, "huh: %d\n", o);
				helpdie();
		}
	}

	params.filenames = &argv[optind];
	params.numprocs = argc - optind;

	if (params.numprocs <= 0 || params.filesize <= 0)
		helpdie();

	printf("running test with: numprocs=%d filesize=%llu blocksize=%d fallocate=%d sfr=%d fadvise=%d sequential=%d\n",
		   params.numprocs,
		   (unsigned long long) params.filesize, params.blocksize,
		   params.fallocate, params.sync_file_range, params.fadvise, params.sequential);
	fflush(stdout);

	for (int fileno = 0; fileno < params.numprocs; fileno++)
	{
		pid_t	pid = fork();

		if (pid == 0)
		{
			runtest(&params, params.filenames[fileno]);
			exit(0);
		}
		else if (pid < 0)
		{
			perror("fork");
			exit(1);
		}
	}

	while (wait(&status) >= 0)
		;
	sleep(1);

	return 0;
}

void
runtest(const runparams* params, char *filename)
{
	const int bs = params->blocksize;
	const uint64_t filesize = params->filesize;
	const bool sfr = params->sync_file_range;
	const bool fadv = params->fadvise;
	char *junk;

	junk = malloc(bs);
	if (!bs) exit(1);

	memset(junk, 'J', params->blocksize);

	time_t	t0 = time(NULL);
	int fd = open(filename, O_CREAT | O_TRUNC | O_WRONLY, 0600);
	if (fd < 0)
	{
		perror("open");
		exit(1);
	}

	time_t	t1 = time(NULL);

	if (params->fallocate)
	{
		if (posix_fallocate(fd, 0, filesize) != 0)
		{
			perror("posix_fallocate");
			exit(1);
		}
	}


	if (params->sequential)
	{
		if (posix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL) != 0)
		{
			perror("posix_fallocate");
			exit(1);
		}
	}

	time_t	t2 = time(NULL);
	uint64_t bytes_written = 0;
	uint64_t last_wait_write = 0;
	uint64_t last_start_write = 0;
	uint64_t last_dontneed = 0;

	while (bytes_written + bs < filesize)
	{
		int wc = write(fd, junk, bs);

		if (wc != bs)
		{
			fprintf(stderr, "wc = %d\n", wc);
			perror("write");
			exit(1);
		}
		bytes_written += bs;


		/* wait for last write */
		if (sfr)
		{
			if (last_wait_write + SFR_WAIT_WRITE_DELAY + SFR_WAIT_SIZE < bytes_written)
			{
				if (sync_file_range(fd, last_wait_write, SFR_WAIT_SIZE, SYNC_FILE_RANGE_WAIT_BEFORE) != 0)
				{
					perror("sfr(wait_before)");
					exit(1);
				}
				last_wait_write += SFR_WAIT_SIZE;
			}

			if (last_start_write + SFR_START_WRITE_DELAY + SFR_START_SIZE < bytes_written)
			{

				if (sync_file_range(fd, last_start_write, SFR_START_SIZE, SYNC_FILE_RANGE_WRITE) != 0)
				{
					perror("sfr(write)");
					exit(1);
				}
				last_start_write += SFR_START_SIZE;
			}
		}

		if (fadv)
		{
			if (last_dontneed + FADVISE_DONTNEED_DELAY + FADVISE_DONTNEED_SIZE < bytes_written)
			{
				if (posix_fadvise(fd, last_dontneed, FADVISE_DONTNEED_SIZE, POSIX_FADV_DONTNEED) != 0)
				{
					perror("fadvise(dontneed)");
					exit(1);
				}
				last_dontneed += FADVISE_DONTNEED_SIZE;
			}
		}
	}

	time_t	t3 = time(NULL);
	if (fsync(fd) != 0)
	{
		perror("fsync");
		exit(1);
	}

	time_t	t4 = time(NULL);
	if (close(fd) != 0)
	{
		perror("close");
		exit(1);
	}

	time_t	t5 = time(NULL);
	printf("[%s][%d] open: %lu, fallocate: %lu write: %lu, fsync: %lu, close: %lu, total: %lu\n",
	       filename, getpid(), t1 - t0, t2 - t1, t3 - t2, t4 - t3, t5 - t4, t5 - t0);
}
testtimeref_cycles_totref_cycles_seccycles_totcycles_secinstructions_totipc
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 sfr=1 fadvise=0248.4307361961,497,048,950,014150.653M/sec1,226,822,167,9600.123GHz705,950,461,1660.54
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 sfr=0 fadvise=1310.2759529381,921,817,571,226154.849M/sec1,499,581,687,1330.121GHz944,243,167,0530.59
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=0 sfr=1 fadvise=1164.175492485913,991,290,231139.183M/sec762,359,320,4280.116GHz678,451,556,2730.84
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=0 fadvise=0243.6099595541,802,385,405,203184.970M/sec1,449,560,513,2470.149GHz855,426,288,0310.56
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=1 fadvise=0230.8801004491,328,417,418,799143.846M/sec1,148,924,667,3930.124GHz723,158,246,6280.63
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=0 fadvise=1253.5912349921,548,485,571,798152.658M/sec1,229,926,994,6130.121GHz1,117,352,436,3240.95
numprocs=1 filesize=429496729600 blocksize=8192 fallocate=1 sfr=1 fadvise=1164.488835158911,974,902,254138.611M/sec760,756,011,4830.116GHz672,105,046,2610.84
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=0 fadvise=0164.0525101341,561,521,537,336237.972M/sec1,404,761,167,1200.214GHz715,274,337,0150.51
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=1 fadvise=0192.1516824141,526,440,715,456198.603M/sec1,037,135,756,0070.135GHz802,754,964,0960.76
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=0 fadvise=1242.6482451591,782,637,416,163183.629M/sec1,463,696,313,8810.151GHz1,000,100,694,9320.69
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=0 sfr=1 fadvise=1188.7721932481,418,274,870,697187.803M/sec923,133,958,5000.122GHz799,212,291,2430.92
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=0 fadvise=0421.5804876422,756,486,952,728163.449M/sec1,387,708,033,7520.082GHz990,478,650,8740.72
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=1 fadvise=0169.8542065421,333,619,626,854196.282M/sec1,036,261,531,1340.153GHz666,052,333,5910.64
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=0 fadvise=1305.0781005781,970,042,289,192161.445M/sec1,505,706,462,8120.123GHz954,963,240,6480.62
numprocs=2 filesize=214748364800 blocksize=8192 fallocate=1 sfr=1 fadvise=1166.2952236261,290,699,256,763194.044M/sec857,873,391,2830.129GHz761,338,026,4150.89
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=0 fadvise=0455.0969167152,808,715,616,077154.293M/sec1,366,660,063,0530.075GHz888,512,073,4770.66
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=1 fadvise=0256.1561006862,407,922,637,215235.003M/sec1,133,311,037,9560.111GHz748,666,206,8050.65
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=0 fadvise=1215.2550153401,977,578,120,924229.676M/sec1,461,504,758,0290.170GHz1,005,270,838,6420.68
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=0 sfr=1 fadvise=1158.2627906541,720,443,307,097271.769M/sec1,004,079,045,4790.159GHz826,905,592,7510.84
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=0 fadvise=0334.9322468932,366,388,662,460176.628M/sec1,216,049,589,9930.091GHz796,698,831,7170.68
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=1 fadvise=0161.6972702851,866,036,713,483288.576M/sec1,068,181,502,4330.165GHz739,559,279,0080.70
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=0 fadvise=1231.4408894301,965,389,749,057212.391M/sec1,407,927,406,3580.152GHz997,199,361,9680.72
numprocs=4 filesize=107374182400 blocksize=8192 fallocate=1 sfr=1 fadvise=1214.4332487002,232,198,239,769260.300M/sec1,073,334,918,3890.125GHz861,540,079,1200.80
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=0 fadvise=0644.5216136613,688,449,404,537143.079M/sec2,020,128,131,3090.078GHz961,486,630,3590.48
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=1 fadvise=0243.8304646321,499,608,983,445153.756M/sec1,227,468,439,4030.126GHz691,534,661,6540.59
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=0 fadvise=1292.8664194201,753,376,415,877149.677M/sec1,483,169,463,3920.127GHz860,035,914,1480.56
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=0 sfr=1 fadvise=1162.152397194925,643,754,128142.719M/sec743,208,501,6010.115GHz554,462,585,1100.70
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=0 fadvise=0211.3695101651,558,996,898,599184.401M/sec1,359,343,408,2000.161GHz766,769,036,5240.57
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=1 fadvise=0233.3150949081,427,133,080,540152.927M/sec1,166,000,868,5970.125GHz743,027,329,0740.64
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=0 fadvise=1290.6981558201,732,849,079,701149.032M/sec1,441,508,612,3260.124GHz835,039,426,2820.57
numprocs=1 filesize=429496729600 blocksize=131072 fallocate=1 sfr=1 fadvise=1159.945462440850,162,390,626132.892M/sec724,286,281,5480.113GHz670,069,573,1500.90
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=0 fadvise=0163.2445922751,524,807,507,173233.531M/sec1,398,319,581,9780.214GHz689,514,058,2430.46
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=1 fadvise=0231.7959343221,731,030,267,153186.686M/sec1,124,935,745,0200.121GHz736,084,922,6690.70
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=0 fadvise=1315.5641637021,958,199,733,216155.128M/sec1,405,115,546,7160.111GHz1,000,595,890,3940.73
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=0 sfr=1 fadvise=1210.9454879611,527,169,148,899180.990M/sec906,023,518,6920.107GHz700,166,552,2070.80
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=0 fadvise=0161.7590940881,468,321,054,671226.934M/sec1,221,167,105,5100.189GHz735,855,415,6120.59
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=1 fadvise=0158.5782489521,354,770,825,277213.586M/sec936,436,363,7520.148GHz654,823,079,8840.68
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=0 fadvise=1274.6285008011,792,841,068,080163.209M/sec1,343,398,055,1990.122GHz996,073,874,0510.73
numprocs=2 filesize=214748364800 blocksize=131072 fallocate=1 sfr=1 fadvise=1179.1400701231,383,595,004,328193.095M/sec850,299,722,0910.119GHz706,959,617,6540.83
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=0 fadvise=0445.4967871992,663,914,572,687149.495M/sec1,267,340,496,9300.071GHz787,469,552,4540.62
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=1 fadvise=0261.8660836042,325,884,820,091222.043M/sec1,094,814,208,2190.105GHz649,479,233,4530.57
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=0 fadvise=1172.9635055441,717,387,683,260248.228M/sec1,356,381,335,8310.196GHz822,256,638,3700.58
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=0 sfr=1 fadvise=1157.9346788971,650,503,807,778261.266M/sec970,705,561,9710.154GHz637,953,927,1310.66
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=0 fadvise=0225.6231436011,804,402,820,599199.938M/sec1,086,394,788,3620.120GHz656,392,112,8070.62
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=1 fadvise=0157.9309009981,797,506,082,342284.548M/sec1,001,509,813,7410.159GHz644,107,150,2890.66
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=0 fadvise=1165.7722653351,805,895,001,689272.353M/sec1,514,173,918,9700.228GHz823,435,044,8100.54
numprocs=4 filesize=107374182400 blocksize=131072 fallocate=1 sfr=1 fadvise=1187.6647644481,964,118,348,429261.660M/sec978,060,510,8800.130GHz668,316,194,9880.67

Reply via email to