Re: Curiosity about file access performance

2021-10-29 Thread bzs


I/O to/from /dev/zero or /dev/null could be special-cased.

Benchmarking file system performance can be fraught.

-- 
-Barry Shein, co-author of nfsstones benchmark

Software Tool & Die| b...@theworld.com | http://www.TheWorld.com
Purveyors to the Trade | Voice: +1 617-STD-WRLD   | 800-THE-WRLD
The World: Since 1989  | A Public Information Utility | *oo*

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Curiosity about file access performance

2021-10-29 Thread Eliot Moss

On 10/29/2021 11:44 AM, Adam Dinwoodie wrote:


AIUI it's a fundamental part of the trade-offs that NTFS makes:
compared to common Linux file systems like ext4, NTFS is much slower
at things like parsing directory structures (which is a necessary part
of opening any given file). In the same way that native Windows
programs tend to use threading implementations that work differently
to fork(), native Windows applications will also often much prefer
large monolithic data files, where native *nix applications are much
more likely to have lots of small files. As a result, for things that
require opening lots of files, WSL (at least if you're using the
native WSL disk, which will be a *nix disk image stored in a file,
rather than files under /mnt/c or similar) will likely be quicker than
a similar operation through Cygwin, as Cygwin will always be affected
by those NTFS overheads.


Ah, that's interesting.  The files in question, that seem to be opened
(and *maybe* read) faster are in the *nix hierarchy, while my book files
are all in Windows (/mnt/c on WSL1).  So the huge speedup reading those
makes sense.  The speedup processing the rest still doesn't quite make
sense, unless maybe WSL1's parsed-directory caching is more effective
than Cygwin's or something.  (I assume something like that is going on,
to reduce conversions of directories to *nix format.)

Regards - Eliot

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Curiosity about file access performance

2021-10-29 Thread Noel Grandin via Cygwin



There are a bunch of different possibilities

(*) temporary files - there was an improvement here in recent cygwin versions which means that if your machine has lots 
of memory and your program creates lot of temporary files, then it will now be significantly faster
(*) file name lookup - linux has a path name cache, which makes it quite a bit faster then Linux for heavy use (git is 
the poster child here)
(*) file information lookup - some of the "default" Unix APIs will look up a bunch of information which is cheap on 
unix, but expensive on Windows. Normally there are alternative API which will only load the minimal set of information, 
which will then be cheaper on Windows.
(*) spawning - it is quite possible that Latex is making heavy use of spawning child processes to do various things, 
which is unfortunately more expensive on Windows.


--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Curiosity about file access performance

2021-10-29 Thread Adam Dinwoodie
On Fri, 29 Oct 2021 at 10:36, Eliot Moss  wrote:
> I think a lot of us know that fork() under Cygwin is slower than on Linux and
> have some grasp of why.  But I have noticed that file access is rather lower
> under Cygwin as well.  My "poster child" for this is running latex.  I am
> working on writing a book, which includes a huge number of LaTeX style files
> and such.  Under WSL1 (which has the same fork cost issues as Cygwin for
> similar reasons), reading the style files goes by in little more than the
> blink of an eye (about 1 sec), while on Cygwin it takes a little over 17 
> seconds.
>
> The time to process the body of the book is 23 seconds under WSL1 and 35 under
> Cygwin.  So the total times are 53 seconds under Cygwin and 24 under WSL1.  I
> believe the LaTeX installations are the same versions, and I get the same
> outputs.  Both LaTeX's are 64 bit programs.  There is not much forking here
> (at least I don't believe there is, but maybe there is under the cover for
> doing things with pdf figures or something), but a fair amount of file I/O.
>
> For many / most things, the Cygwin overhead is tolerable; for running this
> book, since I will be doing it over and over, it was worth investing in
> getting everything set up on WSL1.
>
> But it got me wondering as to why?

AIUI it's a fundamental part of the trade-offs that NTFS makes:
compared to common Linux file systems like ext4, NTFS is much slower
at things like parsing directory structures (which is a necessary part
of opening any given file). In the same way that native Windows
programs tend to use threading implementations that work differently
to fork(), native Windows applications will also often much prefer
large monolithic data files, where native *nix applications are much
more likely to have lots of small files. As a result, for things that
require opening lots of files, WSL (at least if you're using the
native WSL disk, which will be a *nix disk image stored in a file,
rather than files under /mnt/c or similar) will likely be quicker than
a similar operation through Cygwin, as Cygwin will always be affected
by those NTFS overheads.

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Curiosity about file access performance

2021-10-29 Thread Eliot Moss



Sorry, it could depend on what we mean by "file access", so allow me to try to
clarify.  I am grateful of your data since they show that raw data handling
speed is good.  But to read a file you have to open it.  I suspect that file
lookup and opening may be an issue.  Which remains me, I should check and see
if any of the TeX lookup paths are significantly different between the two
cases!

Best wishes - Eliot

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Curiosity about file access performance

2021-10-29 Thread Takashi Yano via Cygwin
On Fri, 29 Oct 2021 10:35:08 +0100
Eliot Moss wrote:
> I think a lot of us know that fork() under Cygwin is slower than on Linux and
> have some grasp of why.  But I have noticed that file access is rather lower
> under Cygwin as well.  My "poster child" for this is running latex.  I am
> working on writing a book, which includes a huge number of LaTeX style files
> and such.  Under WSL1 (which has the same fork cost issues as Cygwin for
> similar reasons), reading the style files goes by in little more than the
> blink of an eye (about 1 sec), while on Cygwin it takes a little over 17 
> seconds.
> 
> The time to process the body of the book is 23 seconds under WSL1 and 35 under
> Cygwin.  So the total times are 53 seconds under Cygwin and 24 under WSL1.  I
> believe the LaTeX installations are the same versions, and I get the same
> outputs.  Both LaTeX's are 64 bit programs.  There is not much forking here
> (at least I don't believe there is, but maybe there is under the cover for
> doing things with pdf figures or something), but a fair amount of file I/O.
> 
> For many / most things, the Cygwin overhead is tolerable; for running this
> book, since I will be doing it over and over, it was worth investing in
> getting everything set up on WSL1.
> 
> But it got me wondering as to why?

Why do you think the cause is the file access performance?
I tested the file access speed using dd as follows.

In cygwin:
[yano@Express5800-S70 ~]$ dd if=/dev/zero of=test.dat bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.186714 s, 2.8 GB/s
[yano@Express5800-S70 ~]$ dd if=test.dat of=/dev/null bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.125709 s, 4.2 GB/s

In WSL1:
Express5800-S70:~> dd if=/dev/zero of=test.dat bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.301657 s, 1.7 GB/s
Express5800-S70:~> dd if=test.dat of=/dev/null bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.229617 s, 2.3 GB/s

The result shows the file access performance of cygwin is
better than WSL1.

I think the cause of your problem is something other than
file access performance.

-- 
Takashi Yano 

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Curiosity about file access performance

2021-10-29 Thread Eliot Moss

Dear Cygwiners -

I think a lot of us know that fork() under Cygwin is slower than on Linux and
have some grasp of why.  But I have noticed that file access is rather lower
under Cygwin as well.  My "poster child" for this is running latex.  I am
working on writing a book, which includes a huge number of LaTeX style files
and such.  Under WSL1 (which has the same fork cost issues as Cygwin for
similar reasons), reading the style files goes by in little more than the
blink of an eye (about 1 sec), while on Cygwin it takes a little over 17 
seconds.

The time to process the body of the book is 23 seconds under WSL1 and 35 under
Cygwin.  So the total times are 53 seconds under Cygwin and 24 under WSL1.  I
believe the LaTeX installations are the same versions, and I get the same
outputs.  Both LaTeX's are 64 bit programs.  There is not much forking here
(at least I don't believe there is, but maybe there is under the cover for
doing things with pdf figures or something), but a fair amount of file I/O.

For many / most things, the Cygwin overhead is tolerable; for running this
book, since I will be doing it over and over, it was worth investing in
getting everything set up on WSL1.

But it got me wondering as to why?

Best wishes - Eliot

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple