Re: surprising glob() result on Windows
On 04/29/2023 3:37 PM, Stan Brown wrote: Stan Brown Tehachapi, CA, USA https://BrownMath.com On 2023-04-29 08:28, Mike wrote: On 04/29/2023 10:51 AM, Mike wrote: On 04/28/2023 9:32 PM, Mike wrote: Briefly, I have a case where glob("*.ext") returns more files than I expect. To give an example, in a directory of your choice create two files named "test.any" and "zest.anyother". The important detail is that the second filename's extension be prefixed by the first filename's extension. Then launch Vim in that directory and run the command :echo glob("*.any") Both files are returned, not just "test.any". I see this on Windows running vim 9.0.1240 with normal features built with Visual C. On the other hand, Vim on my linux box returns only "test.any", as I would expect, so I don't think this a feature. :) I've since rebuilt Vim to include patches up to 1494 and still see the same results on my Windows 10 system. I thought that patches 1400 and 1458 might help but they did not. More potatoes for the stew. Create 5 files: test.a, test.ab, test.abc, test.abcd and test.abcde. Then, using gvim -u NONE -U NONE --noplugin or gvim --clean: glob("*.a") returns test.a glob("*.ab") returns test.ab glob("*.abc") returns test.abc, test.abcd and test.abcde glob("*.abcd") returns test.abcd So the problem occurs when the glob pattern has a 3-character extension. Mike, I saw someone answered this, but maybe their answer didn't reach you? If you're referring to Brams' answer, it did. However, his link primarily referenced FAT-based systems, not NTFS, and so the light-bulb remained off. Short version: Windows is doing what it's supposed to, and so is Vim. The original MS-DOS and MS-Windows file system, in the 1980s, allowed up to 8 characters, and then optionally a dot (period, full stop) plus up to 3 characters. Even if the file was created with lower-case characters in its name, Windows would change those characters to upper case. We can call these "8.3 filenames" for short. Around the turn of the millennium (in Windows XP, if I recall correctly), Windows added so-called long filenames (LFNs), which could be longer than 8.3 and could contain lower-case. Rather than start with a completely new file system (which would then make floppy disks and other interchangeable media unreadable on the previous generation of computers, Microsoft gave any filename that exceeded 8.3 _two_ entries in the directory: one for the actual filename, and one for an 8.3 "short filename" (SFN). If the new file's name fit within 8.3, then it would get only that one entry, an SFN, in the directory. Thus _every_ file had an SFN, but not every file had an LFN. The graphical interface (called File Explorer, Windows Explorer, or Explorer) would show an LFN if one existed, otherwise the SFN. Some time after that, I'm not sure when but certainly by the release of Windows 10, it became possible to disable SFNs for any particular disk partition. And sometime after that, "LFNs only" became the default. But your disk is obviously set to create SFNs from longer filenames. Thank you, now I understand. Motivated by your answer, I've looked-up the NTFS article on wikipedia- https://en.wikipedia.org/wiki/NTFS and it says that short filenames are implemented as "hard links". I, unthinkingly, did not realize this. Your test.a, test.ab, and test.abc all fit in the 8.3 paradigm, and therefore they have only SFNs. Your test.abcd exceeds 8.3, so when you created it Windows set up an SFN for it. How is the SFN formed? Windows ignores any characters beyond the 6.3 limits (6.3, not 8.3), and for the 7th and 8th characters before the dot it adds ~1. Therefore your test.abcd has two names, test.abcd and test~1.abc (probably ~1, but it might be ~ and some other number). test.abcde is probably test~2.abc. When you glob *.abc, the SFN name test~1.abc is caught in that net. But since Windows prefers to show an LFN when one exists, you see them as test.abcd and test.abcde. None of the SFN/LFN business exists on Linux, and since glob() is a Linux thing in origin it doesn't seem unreasonable to me that it doesn't handle this. If you really need to have more than three characters after the dot in filenames, then the simplest thing would be for you to create a wrapper function that calls glob and then in its return filters out anything that doesn't match the input expression. Actually, I discovered this not because of glob() but because ":packadd" was sourcing two files- one named pack.vim and the other named pack.vim9, and both defined global-scope functions with the same name. When I looked at the vim source code it appeared that it relied on glob and so I chose to post the issue using glob as it seemed more fundamental. Again, thanks for taking the time to provide a detailed answer. -mike -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are
Re: surprising glob() result on Windows
Stan Brown Tehachapi, CA, USA https://BrownMath.com On 2023-04-29 08:28, Mike wrote: > On 04/29/2023 10:51 AM, Mike wrote: >> On 04/28/2023 9:32 PM, Mike wrote: >>> Briefly, I have a case where glob("*.ext") returns more files than I >>> expect. >>> >>> To give an example, in a directory of your choice create two files >>> named "test.any" and "zest.anyother". The important detail is that >>> the second filename's extension be prefixed by the first filename's >>> extension. >>> >>> Then launch Vim in that directory and run the command >>> :echo glob("*.any") >>> Both files are returned, not just "test.any". >>> >>> I see this on Windows running vim 9.0.1240 with normal features built >>> with Visual C. On the other hand, Vim on my linux box returns only >>> "test.any", as I would expect, so I don't think this a feature. :) >> >> I've since rebuilt Vim to include patches up to 1494 and still see the >> same results on my Windows 10 system. I thought that patches 1400 and >> 1458 might help but they did not. > > More potatoes for the stew. > > Create 5 files: test.a, test.ab, test.abc, test.abcd and test.abcde. > Then, using gvim -u NONE -U NONE --noplugin or gvim --clean: > glob("*.a") returns test.a > glob("*.ab") returns test.ab > glob("*.abc") returns test.abc, test.abcd and test.abcde > glob("*.abcd") returns test.abcd > > So the problem occurs when the glob pattern has a 3-character extension. Mike, I saw someone answered this, but maybe their answer didn't reach you? Short version: Windows is doing what it's supposed to, and so is Vim. The original MS-DOS and MS-Windows file system, in the 1980s, allowed up to 8 characters, and then optionally a dot (period, full stop) plus up to 3 characters. Even if the file was created with lower-case characters in its name, Windows would change those characters to upper case. We can call these "8.3 filenames" for short. Around the turn of the millennium (in Windows XP, if I recall correctly), Windows added so-called long filenames (LFNs), which could be longer than 8.3 and could contain lower-case. Rather than start with a completely new file system (which would then make floppy disks and other interchangeable media unreadable on the previous generation of computers, Microsoft gave any filename that exceeded 8.3 _two_ entries in the directory: one for the actual filename, and one for an 8.3 "short filename" (SFN). If the new file's name fit within 8.3, then it would get only that one entry, an SFN, in the directory. Thus _every_ file had an SFN, but not every file had an LFN. The graphical interface (called File Explorer, Windows Explorer, or Explorer) would show an LFN if one existed, otherwise the SFN. Some time after that, I'm not sure when but certainly by the release of Windows 10, it became possible to disable SFNs for any particular disk partition. And sometime after that, "LFNs only" became the default. But your disk is obviously set to create SFNs from longer filenames. Your test.a, test.ab, and test.abc all fit in the 8.3 paradigm, and therefore they have only SFNs. Your test.abcd exceeds 8.3, so when you created it Windows set up an SFN for it. How is the SFN formed? Windows ignores any characters beyond the 6.3 limits (6.3, not 8.3), and for the 7th and 8th characters before the dot it adds ~1. Therefore your test.abcd has two names, test.abcd and test~1.abc (probably ~1, but it might be ~ and some other number). test.abcde is probably test~2.abc. When you glob *.abc, the SFN name test~1.abc is caught in that net. But since Windows prefers to show an LFN when one exists, you see them as test.abcd and test.abcde. None of the SFN/LFN business exists on Linux, and since glob() is a Linux thing in origin it doesn't seem unreasonable to me that it doesn't handle this. If you really need to have more than three characters after the dot in filenames, then the simplest thing would be for you to create a wrapper function that calls glob and then in its return filters out anything that doesn't match the input expression. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/2b0898b6-90f3-0fe8-077e-8761a263c0ca%40fastmail.fm.
Re: surprising glob() result on Windows
On 04/29/2023 10:51 AM, Mike wrote: On 04/28/2023 9:32 PM, Mike wrote: Briefly, I have a case where glob("*.ext") returns more files than I expect. To give an example, in a directory of your choice create two files named "test.any" and "zest.anyother". The important detail is that the second filename's extension be prefixed by the first filename's extension. Then launch Vim in that directory and run the command :echo glob("*.any") Both files are returned, not just "test.any". I see this on Windows running vim 9.0.1240 with normal features built with Visual C. On the other hand, Vim on my linux box returns only "test.any", as I would expect, so I don't think this a feature. :) I've since rebuilt Vim to include patches up to 1494 and still see the same results on my Windows 10 system. I thought that patches 1400 and 1458 might help but they did not. More potatoes for the stew. Create 5 files: test.a, test.ab, test.abc, test.abcd and test.abcde. Then, using gvim -u NONE -U NONE --noplugin or gvim --clean: glob("*.a") returns test.a glob("*.ab") returns test.ab glob("*.abc") returns test.abc, test.abcd and test.abcde glob("*.abcd") returns test.abcd So the problem occurs when the glob pattern has a 3-character extension. Any comments? -mike -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/u2jd28%2417r6%241%40ciao.gmane.io.
Re: surprising glob() result on Windows
On 04/28/2023 9:32 PM, Mike wrote: Briefly, I have a case where glob("*.ext") returns more files than I expect. To give an example, in a directory of your choice create two files named "test.any" and "zest.anyother". The important detail is that the second filename's extension be prefixed by the first filename's extension. Then launch Vim in that directory and run the command :echo glob("*.any") Both files are returned, not just "test.any". I see this on Windows running vim 9.0.1240 with normal features built with Visual C. On the other hand, Vim on my linux box returns only "test.any", as I would expect, so I don't think this a feature. :) I've since rebuilt Vim to include patches up to 1494 and still see the same results on my Windows 10 system. I thought that patches 1400 and 1458 might help but they did not. Any comments? -mike -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/u2jata%24t38%241%40ciao.gmane.io.
Re: surprising glob() result on Windows
On 04/29/2023 9:26 AM, Bram Moolenaar wrote: Briefly, I have a case where glob("*.ext") returns more files than I expect. To give an example, in a directory of your choice create two files named "test.any" and "zest.anyother". The important detail is that the second filename's extension be prefixed by the first filename's extension. Then launch Vim in that directory and run the command :echo glob("*.any") Both files are returned, not just "test.any". I see this on Windows running vim 9.0.1240 with normal features built with Visual C. On the other hand, Vim on my linux box returns only "test.any", as I would expect, so I don't think this a feature. :) Any comments? What file system is being used? Windows 10 with NTFS. Some older filesystems use a trick to make long file names possible. The file then appears twice in the directory, once with the short name and once with the long name. Vim may find a match with the short name, includes it in the list of matches and then expands it to the long name. See https://en.wikipedia.org/wiki/Long_filename -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/u2j8je%2419c%241%40ciao.gmane.io.
Re: surprising glob() result on Windows
> Briefly, I have a case where glob("*.ext") returns more files than I > expect. > > To give an example, in a directory of your choice create two files named > "test.any" and "zest.anyother". The important detail is that the second > filename's extension be prefixed by the first filename's extension. > > Then launch Vim in that directory and run the command > :echo glob("*.any") > Both files are returned, not just "test.any". > > I see this on Windows running vim 9.0.1240 with normal features built > with Visual C. On the other hand, Vim on my linux box returns only > "test.any", as I would expect, so I don't think this a feature. :) > > Any comments? What file system is being used? Some older filesystems use a trick to make long file names possible. The file then appears twice in the directory, once with the short name and once with the long name. Vim may find a match with the short name, includes it in the list of matches and then expands it to the long name. See https://en.wikipedia.org/wiki/Long_filename -- ARTHUR: Now stand aside worthy adversary. BLACK KNIGHT: (Glancing at his shoulder) 'Tis but a scratch. ARTHUR: A scratch? Your arm's off. "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net \\\ /// \\\ \\\sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ /// \\\help me help AIDS victims -- http://ICCF-Holland.org/// -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20230429132628.E64621C09E0%40moolenaar.net.