Re: Question about fold
Pádraig Brady wrote: > Moritz Poldrack wrote: > > Pádraig Brady wrote: > > > Both fold and fmt have overlapping functionality. > > > fold is more lower level, using less context, > > > and is operating as expected in this case. > > > > Thank you for your help. That's exactly what I wanted :) There are so > > many coreutils one can lose track of them. > > It's a fair point that the relation is not obvious. > I just pushed a change to fold(1) to SEE ALSO fmt(1). As with many things the difference is in the history of how they were created. I remember fmt being specifically used with vi to perform paragraph rewrapping. Documentation on using vi would describe how to use fmt with text to perform paragraph wrapping. It was not built into vi. The typical example would be this. !}fmt Also I remember (my way of saying I haven't researched it again now for verification) that on HP-UX the fmt command was missing from the operating system. Instead HP-UX shipped an "adjust" command which was similar to fmt but different. At that time it was annoying that it was different from fmt and so most of us in the field compiled fmt from source and installed it as a local command so that we could use fmt the same on all of our systems. No idea why adjust was provided instead of fmt on HP-UX. Neither fmt nor adjust appears in standards. Meanwhile the fold command was more related to pr types of actions where one did not want the right side of text to be lost off the right side of the printer. Here Wikipedia has a good example of each which illustrates the difference between them rather nicely when one compares the output examples on the two pages. https://en.wikipedia.org/wiki/Fold_(Unix) https://en.wikipedia.org/wiki/Fmt_(Unix) Bob
Re: date command is great but could use improvements
David Chmelik wrote: > I noticed two types for weeks in year: 1 to 53, 0 to 53, numbering > 54 weeks, but there can't be 54, right? > Only two options exist to number weekdays, when should be > more/user-configurable. The %Y and %U or %W options work in combination. Use %U for weeks starting with Sunday or %W for weeks starting with Monday. The ISO %G and %V options work in combination. Use them together. Mixing these different sets up creates confusion. Use %Y and %U/%W together or use %G and %V together. Here is an example of a date with some odd corner cases. $ cal 01 2026 January 2026 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ cal 01 2027 January 2027 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ date -d "2026-12-31 12:00" +%G%V 202653 $ date -d "2027-01-01 12:00" +%G%V 202653 $ date -d "2026-12-31 12:00" +%Y%U 202652 $ date -d "2027-01-01 12:00" +%Y%U 202700 Use of ISO week numbers tends to create confusion. The ISO week numbering scheme is somewhat different from calendar week numbering. ISO week numbers start on Monday of the week with the year's first Thursday in it. See Wikipidia’s ISO 8601 page http://en.wikipedia.org/wiki/ISO_8601 or Wikipidia’s ISO week date page http://en.wikipedia.org/wiki/ISO_week_date for a good summary. ISO Week Dates can be created using the following format. $ date -d "2022-01-01 12:00:00 +" "+%G-W%V-%u" 2021-W52-6 $ date -d "2022-01-03 12:00:00 +" "+%G-W%V-%u" 2022-W01-1 > Glad to see one can display weekday 1 to 7 (how everyone thinks > about it) except said 1 is Monday (as in Europe.) I was shocked to see one > can only display Sunday starting week at 0. Everyone thinks of days 1 to 7, > even programmers/sysadmins having to talk so users understand. Majority are > PC users now with Linux kernel being most popular (normally GNU also > available on Replicant/Android/etc.) but people where Sunday is first > weekday (The United States of America, USA, and I've never heard a > programmer/sysadmin/scientist/professor who uses UNIX/GNU//Linux say 'zeroth > weekday') will want PC (so date command) to number days from 1, as date > numbers days month days 1 to up to 31, weeks 1 to 53, quarters/seasons 1 to > 4, months & civil time 1 to 12. This all originates with Unix in the 1970's. Please see the ctime(3) man page. In it one will find something like this (abbreviated) documentation. man ctime The ctime(), gmtime(), and localtime() functions all take an argument of data type time_t, which represents calendar time. When interpreted as an absolute time value, it represents the number of seconds elapsed since the Epoch, 1970-01-01 00:00:00 + (UTC). The asctime() and mktime() functions both take an argument representing broken-down time, which is a representation separated into year, month, day, and so on. Broken-down time is stored in the structure tm, which is defined in as follows: struct tm { int tm_sec;/* Seconds (0-60) */ int tm_min;/* Minutes (0-59) */ int tm_hour; /* Hours (0-23) */ int tm_mday; /* Day of the month (1-31) */ int tm_mon;/* Month (0-11) */ int tm_year; /* Year - 1900 */ int tm_wday; /* Day of the week (0-6, Sunday = 0) */ int tm_yday; /* Day in the year (0-365, 1 Jan = 0) */ int tm_isdst; /* Daylight saving time */ }; The members of the tm structure are: tm_secThe number of seconds after the minute, normally in the range 0 to 59, but can be up to 60 to allow for leap seconds. tm_minThe number of minutes after the hour, in the range 0 to 59. tm_hour The number of hours past midnight, in the range 0 to 23. tm_mday The day of the month, in the range 1 to 31. tm_monThe number of months since January, in the range 0 to 11. tm_year The number of years since 1900. tm_wday The number of days since Sunday, in the range 0 to 6. tm_yday The number of days since January 1, in the range 0 to 365. tm_isdst A flag that indicates whether daylight saving time is in effect at the time described. The value is positive if daylight saving time is in effect, zero if it is not, and negative if the information is not available. This is the origination of the range values. The command line utility is at the source a wrapper around these library routines and these data structures. The call ctime(t) is
bug#54586: dd conv options doc
Karl Berry wrote: > 'fdatasync' > Synchronize output data just before finishing. This forces a > physical write of output data. > > 'fsync' > Synchronize output data and metadata just before finishing. > This forces a physical write of output data and metadata. > > Weirdly, these descriptions are inducing quite a bit of FUD in me. > > Why would I ever want the writes to be incomplete after running dd? > Seems like that is dd's whole purpose. Yes. FUD. The writes are not incomplete. It is no different than any other write. echo "Hello, World!" > file1 Is that write complete? It's no different. If one is incomplete then so is the other. Note that the documentation does not say "incomplete" but says "physical write". As in, chiseled into stone. The dd utility exists with a plethora of low level options not typically available in other utilities. Other utilities such as cp for example. That is one of the distinguishing features making dd useful in a very large number of cases when otherwise we would use cp, rsync, or one of the others. Very low level control of option flags. But just because options exist does not mean they should always be used. Most of the time they should not be used. > Well, I suppose it is too late to make such a radical change as forcing > a final sync. Please, no. Opposing this is the motivation for me writing this response. Things are wastefully slow already due to the number of fsync() calls now coded into everywhere all over the place. Other programs. Not referring to the coreutils here. Let's not make the problem worse by adding them where they are not desired. And that is why it is an option to dd and not on by default. In those specific cases where it is useful then it can be specified as an option. dd is exposing the interface for when it is useful. As a practical matter I think with GNU dd's extensions that I never ever use conv=fsync or conv=fdatasync but instead would always in those same cases use oflag=direct,sync. Such as when writing a removable storage device like a USB drive, that I subsequently will want to remove. There is no benefit to caching the data since it will be invalidated immediately. Not using buffer cache avoids flushing some other data that would be useful to keep in file system buffer cache. When the write is done then the removable media can be removed. This avoids needing to run sync explicitly. Which sync's *everything*. > In which case I suggest adding another sentence along the lines of > "If these options are not specified, the data will be physically > written when the system schedules the syncs, ordinarily every few > seconds" (correct?). Yes. However the behavior might vary slightly between the different kernels such as Linux kernel, BSD kernel, or even HP-UX kernel. Therefore the documentation of it is kernel specific. Even if all of the kernels operated similarly. > "You can also manually sync the output filesystem yourself > afterwards (xref sync)." Otherwise it feels uncertain when or > whether the data will be physically written, or how to look into it > further. Generally this is a task that the operating system should be handling. The programmer taking explicit control defeating the cache is almost always going to be less efficient at it than the operating system. However as you later mention writing an image to a removable storage device like a USB thumbdrive needs to have the data flushed through before removing the device. GNU dd is good for this as I will describe below but otherwise yes a "sync" (either the standalone or the oflag) would be needed to ensure that the data has been flushed through. > As for "metadata", what does dd have to do with metadata? My wild guess > is that this is referring to filesystem metadata, not anything about dd > specifically. Whatever the case, I suggest adding a word or two to the > doc to give a clue. It's not dd's fault. The OS created it first! It's a property given meaning by the OS. The OS defines the option flags. The dd utility is simply a thin layer giving access to the OS file option flags. > Further, why would I want data to be synced and not metadata? Seems like > fdatasync and fsync should both do both; or at least document that > normally they'd be used together. Or, if there is a real-life case where > a user would want one and not the other, how about documenting that? My > imagination is failing me, but presumably these seemingly-undesirable > options were invented for a reason. The fdatasync() man page provides the information. The aim of fdatasync() is to reduce disk activity for applications that do not require all metadata to be synchronized with the disk. In short fdatasync() is less heavy than fsync(). > BTW, I came across these options on a random page discussing dumping a > .iso to a USB drive; the example was > dd if=foo.iso of=/dev/sde conv=fdatasync > .. seems
Re: thoughts on NO_COLOR
Pádraig Brady wrote: > I just noticed some de facto treatment of the NO_COLOR env var. > https://no-color.org/ I happened to run across this site myself a few weeks ago. When I saw it I had this immediate feeling of community. Here was someone else who also felt the oppression of endless flashing lights, ringing of bells, and awful color choices being pushed upon us by default! When working on the computer I really don't want to feel like I am walking through a casino. I forwarded the site to another friend who was also was happy to see a site with documentation as a resource for how to stop the noisy color choices. We agreed it was a useful resource because every program has been set up to do this completely independently and differently. > I was considering having ls --color=auto honor this, but then thought > it is not actually needed in ls, since we give fine grained > control over the colors / styles used. Mostly I can always unset any alias that sets ls with --color. This one is so well known that it's an easy routine. It's the other odd corners that one doesn't do very often that are more problematic. When "dmesg" started spraying me with colors then I had to stop and spend time to figure out how to disable the noise. Utilities such as that are not run into immediately. And then when they do get hit then it is a distraction at that time. Others such as a default vim is almost unusable due to the onslaught. It's worse when one is not in their home environment. Such as if I am debugging some server, standing in a freezing datacenter floor, in a limited environment, without all of my home customization. That's when some of these defaults are truly annoying. > For example one might very well always want at least some distinguishing > of files and directories, with bold / bright etc. > which can be achieved now with LS_COLORS. Actually ls -F is pretty good. That's all I use. > Or looking at it another way, ls is ubiquitous enough > that it's probably already color configured as the user desires, > and having ls honor the less fine grained NO_COLOR flag, > would result in less flexibility. I think ls is ubiquitous enough that everyone has already learned how to deal with it right up front. For me usually \ls is enough. Therefore it isn't as much of a concern for ls. And so I am not going to lobby for coreutils picking up NO_COLOR. But if things were to get worse then it could be a proposal I could get behind. Bob
Re: df command reports incorrect usage
Fariya F wrote: > Yes it is an embedded system. The package we have used is coreutils 8.25 > and not 1.35. Very sorry for the confusion. Ah! No worries. :-) Then I will ask if my assumption about /dev/mmcblk2 being an SD card is incorrect? Is it actually soldered on flash NAND? In which case it won't be so easy to deal with things. I do think a recipe of backup, wipe, mkfs, restore, is still the easiest best way to recover. However it should also be possible to reach into there and repair the current file system too but that would take a lot more effort and skill. Bob
Re: df command reports incorrect usage
Rob Landley wrote: > Fariya F wrote: > > The version of df command is 1.85. A current version of Coreutils df by most distros would be 8.x and the upstream stable version is version 9. This leads me to believe that the df you are using is not from Coreutils but from somewhere else. But not Busybox because that would only be up to version 1.35 so far. In any case it seems that you are not using the GNU Coreutils df program at all here. I am so very curious as to what system this is on. And wondering if it is an embedded system. > > Kernel version 4.9.31 > > A 2016 release, plus bugfixes. Yes. But I think unlikely to the be problem here. If we were in 2016 ourselves right now then we would be looking at that and seeing that it was as up to date as it was possible to be at that time. The kernel had been dealing with file systems, ext4, and SD cards for decades already by 2016. > > The fsck check on that partition succeeds. I am able to use the partition, > > create files etc. However df reports incorrectly the size and used > > percentage. > > > > Output from strace command is as follows; > > > > statfs64("/data", 88, {f_type="EXT2_SUPER_MAGIC", > > f_bsize=1024, f_blocks=18446744073659310077, f_bfree=87628, f_bavail=80460, > > f_files=25688, f_ffree=25189, f_fsid={-1446355608, 1063639410}, > > f_namelen=255, f_frsize=1024, f_flags=4128}) = 0 > > > > As can be seen, value of f_blocks is a huge one. ... > Probably fixed in one of the thousand-plus commits to fs/ext4 since then. I'd I think more likely that the SD card image is corrupted. Which would mean that even the newest kernel today would report the same information. > Another thing you could do is copy your image to file (cat /dev/sdx > > file.img) > and use a VM image to read it (I use > https://landley.net/toybox/faq.html#mkroot > for this sort of thing) and then see if current kernels can read it properly > (if > not, report it to lkml), and if it WAS fixed git bisect between v4.9 and > master > to find the commit that fixed it. (Remember the bad-is-good swap in that > case.) > > Hmmm, I'm making tutorial videos now... How big is this storage device, and > does > it have anything private on it? It was reported to be 100MB ext4 in size with only 5.8MB used. Really very small. And the root cause of the problem I think will be isolated to the file system superblocks. So actually smaller. You could probably completely debug the problem top to bottom. > > How can we get this corrected and where is the corruption because > > of which this value goes wrong? If it's only 5.8MB of used data according to du then here is what I would do to recover from this problem. I would copy all of the data off of this ext4 file system to a backup location. That's very small these days. I would make a complete backup copy. And since I suspect there is other data on other partitions of this SD card I would also backup ALL of the data on this card and not just this partition. And if that backup copy works then I would count yourself lucky. Because it might not work. Actually trying to copy the files off might uncover where one of the file chains is corrupted. However you had also reported that fsck had been successfully run on it. Therefore I think it likely that you will be able to copy all of the files off of this SD card and make a successful backup. Then I myself would throw the SD card away! Since it is likely bad. And in a 32GB size card right now we are talking about USD$14 or so. Replace it with a new one since that is inexpensive. However you say this partition is only 100MB. Which implies to me that this is one part of the larger SD card. Is that right? If it were me then after making a complete backup of ALL files then I would run badblocks with the destructive -w option (read the documentation as it is a DESTRUCTIVE TEST) on the partition and check to see if the card hardware is functional or not. If that fails then obviously the hardware has failed. If that succeeds then I would think the image had just gotten corrupted. In which case the file system could be rebuilt. And having been rebuilt the new bits would be correctly constructed and the problem solved. If it is not a hardware failure and you really want to use this SD card then after ensuring that I had a FULL BACKUP I would then wipe the file system signatures off of the partition with "wipefs -a" and then rebuild the file system on it with "mkfs -t ext4". Then mount the file system and copy back the data from the backup copy that was made. That recipe would create a new file system fresh on it and that would fix the bad data in the file system superblocks. So far there has been no mention if this is a bootable image or not. If so then it may be necessary to restore an MBR boot record. Bob
Re: df command reports incorrect usage
Fariya F wrote: > My eMMC device has a partition which reports the below output from df -h > command: > > Filesystem Size Used Avail Use% Mounted on > /dev/mmcblk2p316Z 16Z 84M 100% /data Since the df command is simply reporting back on what the Linux kernel reports, and the kernel is reporting data from the file system, this looks to be a problem with the file system. Since this looks like an SD card it seems likely that the SD card is not happy. It would be a good idea to test the SD card to see if it has failed. It would be useful to know at least some information in addition to this output. What is the specific version of df? df --version What kernel version are you using. uname -a Since df is asking for the file system information from the kernel it would be useful to know what answer the kernel provided. strace -v -e statfs /bin/df -h /data On my system I see the following from a couple of different examples. Just to provide something to show what would be useful. rwp@angst:/tmp$ strace -v -e statfs /bin/df -hT /data statfs("/data", {f_type=MSDOS_SUPER_MAGIC, f_bsize=16384, f_blocks=2044, f_bfree=857, f_bavail=857, f_files=0, f_ffree=0, f_fsid={val=[45826, 0]}, f_namelen=1530, f_frsize=16384, f_flags=ST_VALID|ST_RDONLY|ST_RELATIME}) = 0 Filesystem Type Size Used Avail Use% Mounted on /dev/mmcblk0p2 vfat 32M 19M 14M 59% /data +++ exited with 0 +++ root@angst:~# strace -v -e statfs /bin/df -hT /data statfs("/data", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=1924651, f_bfree=673176, f_bavail=579409, f_files=496784, f_ffree=417167, f_fsid={val=[961623697, 1875516586]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/v1-var ext3 7.4G 4.8G 2.3G 69% /data +++ exited with 0 +++ I looked specifically at the statfs(2) system call here as it provides the majority of the information. This is for a Linux kernel system. However if your system is much different then different information might be displayed or might be needed. Bob
bug#53631: coreutils id(1) incorrect behavior
Vladimir D. Seleznev wrote: > Expected behavior is: > # id user1 > uid=1027(user1) gid=1027(user1) groups=1027(user1) > # id user2 > uid=1027(user1) gid=1027(user1) groups=1027(user1),1028(somegroup) I just tried a test on both FreeBSD and NetBSD and both FreeBSD and NetBSD behave as you expect. That would give weight for GNU Coreutils matching that behavior. > Example: > # useradd user1 > # groupadd somegroup > # useradd -o -u "$(id -u user1)" -g "$(id -G user1) -G somegroup user2 I'll just note that there is a missing ending quote character. It's also missing the -m option to create a home directory. For those who wish to recreate the test case. root@turmoil:~# tail -n2 /etc/passwd /etc/group /etc/shadow /etc/gshadow ==> /etc/passwd <== user1:x:1001:1001::/home/user1:/bin/sh user2:x:1001:1001::/home/user2:/bin/sh ==> /etc/group <== user1:x:1001: somegroup:x:1002:user2 ==> /etc/shadow <== user1:!:19022:0:9:7::: user2:!:19022:0:9:7::: ==> /etc/gshadow <== user1:!:: somegroup:!::user2 With the above things are not really a valid configuration. Therefore I don't think it is surprising that the utilities don't "figure it out" completely correctly. I have never seen user2 used with a different set of groups than the primary uid specifies. I think in practice that will be problematic. Since the system will use the uid for such things and the uid would map to a different set of auxilary groups. I think in practice this case is a problematic case at the least. Note that it is perfectly valid and long standing practice to allow multiple passwd entries with the same uid number. That's a technique to allow multiple different passwords and login shells for the same account. [[ I'll further note that use of nscd completely breaks this useful ability by hashing all duplicate uid entries together. Like in The Highlander, with nscd there can be only one. It's why I never use nscd anywhere as this makes it not suitable for purpose. But that's rather off this topic. I'll bracket it as an aside. ]] Bob
bug#53033: date has multiple "first saturday"s?
Darryl Okahata wrote: > Bob Proulx wrote: > Inconsistencies like this are why I wish it had never been implemented. > Best to avoid the syntax completely. > > Thanks. I'll avoid date and use either python or ruby to get this info. To be clear what I meant was that I would avoid the ordinal word descripts such as first, second, and third because as documented the use of second is already used for the time unit. I meant that instead it would be better to use the actual numbers 1, 2, and 3, to avoid that problem. However reading your report again I now question whether I understand what you were trying to report specifically. Initially you wrote: $ date -d "first saturday" Sat Jan 8 00:00:00 PST 2022 Running it again today I get. $ date -d "first saturday" Sat Jan 15 12:00:00 AM MST 2022 $ date -d "next saturday" Sat Jan 15 12:00:00 AM MST 2022 That's the first Saturday after now. The debug is valuable information. $ date --debug -d 'first saturday' date: parsed day part: next/first Sat (day ordinal=1 number=6) date: input timezone: system default date: warning: using midnight as starting time: 00:00:00 date: new start date: 'next/first Sat' is '(Y-M-D) 2022-01-15 00:00:00' date: starting date/time: '(Y-M-D) 2022-01-15 00:00:00' date: '(Y-M-D) 2022-01-15 00:00:00' = 164223 epoch-seconds date: timezone: system default date: final: 164223.0 (epoch-seconds) date: final: (Y-M-D) 2022-01-15 07:00:00 (UTC) date: final: (Y-M-D) 2022-01-15 00:00:00 (UTC-07) Sat Jan 15 12:00:00 AM MST 2022 Is it useful to know the date, say..., three Saturdays from now? I am sure there is a good case for it. But it always leaves me scratching my head wondering. Because it is basically working with the date of today, at midnight, then the next Saturday. $ date --debug -d 'third saturday' date: parsed day part: third Sat (day ordinal=3 number=6) date: input timezone: system default date: warning: using midnight as starting time: 00:00:00 date: new start date: 'third Sat' is '(Y-M-D) 2022-01-29 00:00:00' date: starting date/time: '(Y-M-D) 2022-01-29 00:00:00' date: '(Y-M-D) 2022-01-29 00:00:00' = 1643439600 epoch-seconds date: timezone: system default date: final: 1643439600.0 (epoch-seconds) date: final: (Y-M-D) 2022-01-29 07:00:00 (UTC) date: final: (Y-M-D) 2022-01-29 00:00:00 (UTC-07) Sat Jan 29 12:00:00 AM MST 2022 It seems to me that it would be just as clear to use numbers in that position so as to avoid ambiguity. $ date --debug -d '2 saturday' date: parsed day part: (SECOND) Sat (day ordinal=2 number=6) date: input timezone: system default date: warning: using midnight as starting time: 00:00:00 date: new start date: '(SECOND) Sat' is '(Y-M-D) 2022-01-22 00:00:00' date: starting date/time: '(Y-M-D) 2022-01-22 00:00:00' date: '(Y-M-D) 2022-01-22 00:00:00' = 1642834800 epoch-seconds date: timezone: system default date: final: 1642834800.0 (epoch-seconds) date: final: (Y-M-D) 2022-01-22 07:00:00 (UTC) date: final: (Y-M-D) 2022-01-22 00:00:00 (UTC-07) Sat Jan 22 12:00:00 AM MST 2022 There is no need for "second" in the "second saturday" when using the relative time "2 saturday" produces the desired answer. My wondering now is if "2 saturday" was actually what was desired at all. Perhaps it was really wanted to know the date of the first Saturday of the month? That's entirely a different problem. Also, when working with dates I strongly encourage working with UTC. I went along with the original example. But I feel I should have been producing examples like this instead with -uR. $ date -uR --debug -d '2 saturday' date: parsed day part: (SECOND) Sat (day ordinal=2 number=6) date: input timezone: TZ="UTC0" environment value or -u date: warning: using midnight as starting time: 00:00:00 date: new start date: '(SECOND) Sat' is '(Y-M-D) 2022-01-22 00:00:00' date: starting date/time: '(Y-M-D) 2022-01-22 00:00:00' date: '(Y-M-D) 2022-01-22 00:00:00' = 1642809600 epoch-seconds date: timezone: Universal Time date: final: 1642809600.0 (epoch-seconds) date: final: (Y-M-D) 2022-01-22 00:00:00 (UTC) date: final: (Y-M-D) 2022-01-22 00:00:00 (UTC+00) Sat, 22 Jan 2022 00:00:00 + Bob
bug#53145: "cut" can't segment Chinese characters correctly?
zendas wrote: > Hello, I need to get Chinese characters from the string. I googled a > lot of documents, it seems that the -c parameter of cut should be > able to meet my needs, but I even directly execute the instructions > on the web page, and the result is different from the > demonstration. I have searched dozens of pages but the results are > not the same as the demo, maybe this is a bug? Unfortunately the example was attached as images instead of as plain text. Please in the future copy and paste the example as text rather than as an image. As an image it is impossible to reproduce by trying to copy and paste the image. As an image it is impossible to search for the strings. The images were also lost somehow from the various steps in the mailing list pipelines with this message. First it was classified as spam by the anti-spam robot (SpamAssassin-Bogofilter-CRM114). I caught it in review and re-sent the message. That may have been the problem specifically with images. > For example: > https://blog.csdn.net/xuzhangze/article/details/80930714 > [20180705173450701.png] > the result of my attempt: > [螢幕快照 2022-01-10 02:49:46.png] One of the two images: https://debbugs.gnu.org/cgi/bugreport.cgi?msg=5;bug=53145;att=3;filename=20180705173450701.png Second problem is that the first image shows as being corrupted. I can view the original however. To my eye they are similar enough that the one above is sufficient and I do not need to re-send the corrupted image. As to the problem you have reported it is due to lack of internationalization support for characters. -c is the same as -b at this moment. https://www.gnu.org/software/coreutils/manual/html_node/cut-invocation.html#cut-invocation ‘-c CHARACTER-LIST’ ‘--characters=CHARACTER-LIST’ Select for printing only the characters in positions listed in CHARACTER-LIST. The same as ‘-b’ for now, but internationalization will change that. Tabs and backspaces are treated like any other character; they take up 1 character. If an output delimiter is specified, (see the description of ‘--output-delimiter’), then output that string between ranges of selected bytes. For multi-byte UTF-8 characters the -c option will operate the same as the -b option as of the current version and is not suitable for dealing with multi-byte characters. $ echo '螢幕快照' 螢幕快照 $ echo '螢幕快照' | cut -c 1 ? $ echo '螢幕快照' | cut -c 1-3 螢 $ echo '螢幕快照' | cut -b 1-3 螢 If the characters are known to be 3 bytes multi-characters then I might suggest using -b to workaround the problem assuming 3 byte characters. Eventually when -c is coded to handle multi-byte characters the handling as bytes will change. Using -b would avoid that change. Some operating systems have patched that specific version of utilities locally to add multi-byte character handling. But the patches have not been found acceptable for inclusion. That is why there are differences between different operating systems. Bob
bug#53033: date has multiple "first saturday"s?
Darryl Okahata via GNU coreutils Bug Reports wrote: > From coreutils 9.0 (note the difference between the "second" and "third" > saturdays): ... > $ src/date --debug -d "second saturday" > date: parsed relative part: +1 seconds Caution! The date utility can't parse second due to second being a unit of time. The documentation says: A few ordinal numbers may be written out in words in some contexts. This is most useful for specifying day of the week items or relative items (see below). Among the most commonly used ordinal numbers, the word ‘last’ stands for -1, ‘this’ stands for 0, and ‘first’ and ‘next’ both stand for 1. Because the word ‘second’ stands for the unit of time there is no way to write the ordinal number 2, but for convenience ‘third’ stands for 3, ‘fourth’ for 4, ‘fifth’ for 5, ‘sixth’ for 6, ‘seventh’ for 7, ‘eighth’ for 8, ‘ninth’ for 9, ‘tenth’ for 10, ‘eleventh’ for 11 and ‘twelfth’ for 12. Inconsistencies like this are why I wish it had never been implemented. Best to avoid the syntax completely. Bob
bug#52481: chown of coreutils may delete the suid of file
21625039 wrote: > [root@fedora ~]# ll test.txt > -rwsr-x---. 1 root root 0 Dec 13 21:13 test.txt > > [root@fedora ~]# chown root:root test.txt > [root@fedora ~]# ll test.txt > -rwxr-x---. 1 root root 0 Dec 13 21:13 test.txt That is a feature of the Linux kernel, OpenBSD kernel, and NetBSD kernel, and I presume of other kernels too. I know that traditional Unix systems did not. But this is done by the kernel as a security mitigation against some types of attack. For example a user might have a file which is in their own directory tree. It might be executable and setuid. Then through a social engineering attack they coerce root into copying the file or otherwise taking ownership of the directory tree because they are hoping to make use of the now newly chowned root file that is executable. Therefore as a security mitigation implemented by the OS kernel the setuid bit is removed when chown'ing files. If this is truly desired then the file can be chmod'd explicitly after chown'ing the file. This is entirely a kernel behavior and not of chown(1). This isn't specific to chown(1) command line utility at all. For example you can test that the same behavior from the kernel exists when using any programming language. It will have the same behavior. Without Coreutils involved at all. # ll test.txt -rwsr-xr-x 1 rwp rwp 0 Dec 17 17:13 test.txt # perl -e 'chown 0, 0, "test.txt" or die;' # ll test.txt -rwxr-xr-x 1 root root 0 Dec 17 17:13 test.txt Bob
bug#52206: Bug: rm -rf /*/*
Bob Proulx wrote: > Paul Eggert wrote: > > Robert Swinford wrote: > > > BTW, zsh globbing doesn’t exhibit this behavior! It seems it is only a > > > problem in bash. > > > > In that case, the bug (whatever it is) wouldn't be a coreutils bug. > > I don't understand the comment that zsh doesn't expand the glob /*/* > and I tried it and verified that it does indeed expand that glob > sequence. Lawrence Velazquez made sense of this on the bug-bash list. https://lists.gnu.org/archive/html/bug-bash/2021-11/msg00193.html Bob
bug#52115: Suggestion: LN command should swap TARGET and LINK_NAME if LINK_NAME already exists
Paul Eggert wrote: > Bob Proulx wrote: > > mv calls it SOURCE and DEST. cp calls it SOURCE and DEST. Perhaps ln > > should also call it SOURCE and DEST too for consistency? > > That's what ln did long ago, but that wording was deemed too confusing. > Here's where we changed it to use something more like the current wording: > > https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=519365bb089cf90bdc780c37292938f42019c7ea This just proves that there is no perfect solution. It's a flip-flop with either state having inperfections. My first thought was how humorous this situation is that due to complaints about the documentation we would be lead in a circle back to the beginning when this was changed previously due to complaints about the documentation. Bob
bug#52115: Suggestion: LN command should swap TARGET and LINK_NAME if LINK_NAME already exists
Andreas Schwab wrote: > Bob Proulx wrote: > > The more I think about it the more I think it should say CONTENT > > rather than either TARGET or SOURCE. Because it is actually setting > > the content of the symbolic link. > > A hard link doesn't have content. But we are talking about symbolic links which do have content. Bob
bug#52206: Bug: rm -rf /*/*
Paul Eggert wrote: > Robert Swinford wrote: > > This seems like a bug: > > https://twitter.com/nixcraft/status/1465599844299411458 > > I don't see a coreutils bug there: rm operated as specified. Agreed. It's not an rm bug. It's definitely unfortunate. It's similarly unfortunate to riding a bicycle into a lake. But it isn't a defect in the bicycle that it could not prevent someone from riding it into a lake. > > Interestingly, however, rm -rf // only does the following: > > Yes, that's a special feature of GNU rm. And apparently Bryan Cantrill reports that Solaris has the same feature as GNU rm does for "rm -rf /" protection. > > I believe illumos has already solved this problem in a POSIX compliant > > fashion > > Not sure what you're talking about here. Could you be specific? Don't have > time to watch videos. I watched the cited video. It features an interview with Bryan Cantrill who very dynamically and entertainingly tells a story about a scripted "rm -rf $1/$2" without checking if $1 and $2 were set or unset resulting in "rm -rf /" being run by accident. And therefore he reports that Solaris implemented the prevention of running "rm -rf /". This is said at time 1:27:00 in the video. Which I note is the same protection as GNU rm does. So there isn't anything for GNU rm to implement in order to match Solaris as it appears to be the same by this report. However $var1/$var2 expanding to / when those variables are not set is a different case than /*/* expansion which has no variables and is simply an error of usage. > > BTW, zsh globbing doesn’t exhibit this behavior! It seems it is only a > > problem in bash. > > In that case, the bug (whatever it is) wouldn't be a coreutils bug. I don't understand the comment that zsh doesn't expand the glob /*/* and I tried it and verified that it does indeed expand that glob sequence. Bob
bug#52115: Suggestion: LN command should swap TARGET and LINK_NAME if LINK_NAME already exists
Bob Proulx wrote: > With symbolic links the symlink contains a string. The string could > be pretty much anything. The more I think about it the more I think it should say CONTENT rather than either TARGET or SOURCE. Because it is actually setting the content of the symbolic link. Therefore that seems the most accurate. Although VALUE also seems to have merit. ln [OPTION]... CONTENT DEST Bob
bug#52176: Problem with email list tags
Ulf Zibis wrote: > Currently we have: > List-Post: GNU coreutils Bug Reports > > When using "reply list" to answer to a comment of bug 12345 in a email client > such as Thunderbird, my reply is sent to bug-coreutils@gnu.org, but it should > be sent to 12...@debbugs.gnu.org > > So I think, we should have: > List-Post: GNU coreutils Bug Reports <12...@debbugs.gnu.org> > > Alternatively the following tag could be added: > Reply-To: 12...@debbugs.gnu.org Please send comments, complaints, gripes, or suggestions about the BTS to the help-debbugs AT gnu.org mailing list instead. GNU Coreutils is a user of the BTS but not a maintainer of the BTS. Note that if a reply to bug-coreutils (or any of the BTS bug lists) contains a subject containing "bug#52176" then the BTS will route it to 52176 AT debbugs.gnu.org automatically. Or at least it is supposed to be routing it automatically. Therefore either should actually work correctly. Note that if someone sends several messages with the same subject then there is also logic in the BTS to try to route later messages to the same bug ticket as the first message. This is defeated if all of the messages arrive at once. But works if there is enough delay for the first message to be allowed to create a ticket before subsequent messages arrive. Usually when we see multiple tickets created from a user sending multiple messages it is due to them arriving into the BTS all at the same time. Hint for people moderating spam through Mailman, send the first one through but pause a moment or three before sending the follow-ups through. Bob
bug#52115: Suggestion: LN command should swap TARGET and LINK_NAME if LINK_NAME already exists
Chris Elvidge wrote: > Paul Eggert wrote: > > Ulf Zibis wrote: > > > I think, for beginners it would be less confusing, if the most > > > simple form would be the first. > > > > Unfortunately the simple form "ln TARGET" is quite rarely used, so > > putting it first is likely to confuse beginners even more than what we > > have already. Come to think of it, perhaps we should put the simple form > > last instead of 2nd. +1 for putting it last. > I use 'ln -s "source"' quite a lot for linking into e.g. /usr/local/bin from > my own $HOME/bin. It defaults to "." as the target in that case. I never liked that it was allowed to be optional as I think it makes things much more confusing than the two characters saved. > The real problem could be with the terminology. > 'ln [options] TARGET [LINK_NAME]'; the TARGET is really the source, which > obviously must exist. A TARGET is really something you aim at. Mostly agree. With symbolic links the symlink contains a string. The string could be pretty much anything. By convention it contains the path to another file. (Or to another special file. Everything is a file.) But it is also used to contain a small bit of information in other cases. Such as for lockfiles and other uses. Therefore source isn't quite right. But maybe it is good enough. Because CONTENTS seems less good even if perhaps more accurate. > Perhaps it should be changed to 'ln [options] source [link]' mv calls it SOURCE and DEST. cp calls it SOURCE and DEST. Perhaps ln should also call it SOURCE and DEST too for consistency? cp [OPTION]... [-T] SOURCE DEST mv [OPTION]... [-T] SOURCE DEST ln [OPTION]... [-T] SOURCE DEST I like the consistency of that. Although I don't like that -T is not apparently an OPTION. It's not? Why not? Shouldn't that synopsis form simply be these? cp [OPTION]... SOURCE DEST mv [OPTION]... SOURCE DEST ln [OPTION]... SOURCE DEST Bob
bug#52115: Suggestion: LN command should swap TARGET and LINK_NAME if LINK_NAME already exists
Warren Parad wrote: > except mv(1) and cp(1) are both "FROM" and then "TO", but ln is backwards > from thi, it is "TO" then "FROM", the least the command could do is put > these in the correct order. But that is not correct. The order for ln is the same as for cp and mv in that the target getting created is the right side argument. (Unless the -t or -T option is used to do it differently by explicit syntax request. Unless no target is specified in which case dot is assumed. I admit those two "unless" cases complicate the original simplicity. But the normal case is to create the right side argument as the target of the command.) > > it is a one-time effort to learn the order > Opinion, do you want proof that people can't learn this, because they > haven't. The target getting created is the right side argument. If that is not clear from the documentation then improving the documentation is always good. Let me say with some confidence that if the order were changed to create the left argument that people would be very upset that cp and mv created the right side argument but ln created a left side argument! Bob
bug#51345: dd with conv=fsync sometimes returns when its writes are still cached
Sworddragon wrote: > On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils > 8.32 I wanted to overwrite my USB Thumb Drive a few times with random data > via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually > takes ~2+ minutes to perform this action dd returned once after less than > 60 seconds which made me a bit curious. I suggest another try using oflag=direct instead of conv=fsync. dd if=/dev/random of=/dev/sdb bs=1M oflag=direct Also with rates status. dd if=/dev/random of=/dev/sdb bs=1M oflag=direct status=progress Here is the documentation for it. ‘oflag=FLAG[,FLAG]...’ ‘direct’ Use direct I/O for data, avoiding the buffer cache. Note that the kernel may impose restrictions on read or write buffer sizes. For example, with an ext4 destination file system and a Linux-based kernel, using ‘oflag=direct’ will cause writes to fail with ‘EINVAL’ if the output buffer size is not a multiple of 512. Bob
Re: feature request: echo --
Florent Flament wrote: > Bob Proulx writes: > >> In which case, thoroughly documenting the edge cases of the 'echo' > >> command and inviting programmers to use 'printf' instead on its > >> manpage (I know 'printf' is mentioned on the coreutils info page, > >> but it's one additional level of indirection) would probably be > >> helpful for the community. I'd gladly try to update the manpage if > >> that be the way to go. > > > > That's an excellent idea. I agree completely. We should document > > echo behavior and invite programmers to use printf instead. :-) > > Let's do that then. To be totally honest I thought echo was already very well documented. And I thought we already were strongly recommending people to use printf for arbitrary data. Therefore honestly I thought by agreeing I thought we were done. Bob
Re: feature request: echo --
Florent Flament wrote: > Bob Proulx wrote: > However, I believe that the use of "shall not" makes the POSIX echo > definition ambiguous: > > The echo utility shall not recognize the "--" argument in the > manner specified by Guideline 10 of XBD Utility Syntax Guidelines; > "--" shall be recognized as a string operand. > > Implementations shall not support any options. That seems fairly unambiguous to me. > In the case of '--', "shall not" seems to mean "is forbidden to", > because '--' "must" be recognized as a string operand. In the case of > options, "shall not" seems to mean "don't have to", because it is then > mentioned that the behavior is implementation dependent when using the > '-n' flag. The "-n" in the "string" position is listed specifically as an exception to the general rule previously stated. Basically an interpretation might be don't implement getopt option processing in general but look specifically at the first argument for this specific case, it being implementation defined. > That said, I believe that the POSIX echo definition is broken, because > it tried to reconcile incompatible existing implementations. Originally POSIX was not the design document it has become in later years. Originally POSIX was an operating system feature non-proliferation treaty trying to keep the differences among systems from becoming even more different. At the time it was created there were already two major flavors of echo on different systems. And it was already a portability problem. Therefore the standard tried to freeze the common feature set and allow the existing differences. > > Unix v7 echo accepted -n as an option and did not print it. Same > > with BSD. It is too late to mess with it now. > > Then these implementations are broken as well. That point is similar to saying that one should choose their parents better. It may be true! But it is pointless because one does not have the possibility of choosing their parents. And similarly when echo was originally created it was done without the benefit of 50 years of hindsight looking back at the problems the choices made then have caused over the years and still continue to cause now. > Also, I can understand that one may not want to change the behavior > of a command that has been used for 30 years. That's the main point of the discussion. It's been 50 years and some expectation of stability should be maintained. On the other hand if people want to fix everything then I think they should be free to create their own new operating system and do so there. That's fine. New systems don't have legacy and don't need to be backward compatible. So break whatever you want there. But don't break the existing system for others along the way. > > On a practical level why is it an issue at all? If there is any > > concern about printing -n then use 'printf' as that is a much better > > solution for arbitrary strings. > > On a practical level, I have seen the 'echo' command being used in > many shell scripts to process arbitrary strings. Which means that in > some cases (when processing the strings "-n", "-e" or "-E") these > scripts won't behave as expected by the programmer. Then those are the bugs to have fixed. > I agree that 'printf' should have been used instead, but it seems > that programmers have been taught to use 'echo' to print strings in > shell scripts for many years (that's my case as well). I still use echo to this very day. echo "Hello world!" But that does not include any options and does not include any backslash escape sequences. If it did or if there was any possibility of it doing so then I would program it differently. printf "What do you want me to say? " IFS= read -r phrase echo $phrase # N! printf "What do you want me to say? " IFS= read -r phrase printf "%s\n" "$phrase" # Yes! The above shows input from an untrusted source. It's "tainted". It should not be used without caution. > > Also note that most shells include 'echo' as a builtin command which > > will be different from the coreutils standalone executable. Most > > users should look to their shell for their echo implementation > > instead. > > That is also true, though chances are that the shell builtin > implementations of 'echo' be probably inspired (if not copied) from > the GNU echo implementation. Actually no. Their behaviors were based upon existing echo implementations that pre-date GNU echo that GNU echo was itself also based upon. > What I would expect from a good implementation of the 'echo' command > is to interpret the '--' argument
Re: Selecting multiple delimiters in sort
tolugboji via GNU coreutils General Discussion wrote: > If "sort" did support multiple delimiters, I could numerically sort the > following set of filenames > using the second field. > > schimmel-04.png > schimmel-05.png > schimmel-06.png > schimmel-07.png > schimmel-08.png > schimmel-09.png > schimmel-10.png > schimmel-11.png > schimmel-12.png > schimmel-13.png > schimmel-1.png > schimmel-2.png > schimmel-3.png One can sort those using a numeric sort. $ sort -t- -k1,1 -k2,2n file1 schimmel-1.png schimmel-2.png schimmel-3.png schimmel-04.png schimmel-05.png schimmel-06.png schimmel-07.png schimmel-08.png schimmel-09.png schimmel-10.png schimmel-11.png schimmel-12.png schimmel-13.png Or as a non-standard extension use --version-sort. $ sort --version-sort file1 schimmel-1.png schimmel-2.png schimmel-3.png schimmel-04.png schimmel-05.png schimmel-06.png schimmel-07.png schimmel-08.png schimmel-09.png schimmel-10.png schimmel-11.png schimmel-12.png schimmel-13.png I say non-standard but it exists in GNU and FreeBSD so that may be portable enough. But it does not exist on NetBSD for example. So some caution for portability is required. Bob
Re: feature request: echo --
Florent Flament wrote: > Out of curiosity, would it possible to have the `echo` command output > the string "-n" ? > > ``` > $ POSIXLY_CORRECT=1 /bin/echo -n > ``` But the standards do actually mention -n. The behavior you see with POSIXLY_CORRECT=1 is conforming behavior. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37 If the first operand is -n, or if any of the operands contain a character, the results are implementation-defined. Unix v7 echo accepted -n as an option and did not print it. Same with BSD. It is too late to mess with it now. On a practical level why is it an issue at all? If there is any concern about printing -n then use 'printf' as that is a much better solution for arbitrary strings. Also note that most shells include 'echo' as a builtin command which will be different from the coreutils standalone executable. Most users should look to their shell for their echo implementation instead. Bob
bug#47476: relative date of -1 month shows the wrong month
Lars Nooden wrote: > On March 29, 2021, if a relative date of '-1 month' is passed to 'date', > then the output shows March instead of February. The date manual includes this section on relative months. The fuzz in units can cause problems with relative items. For example, ‘2003-07-31 -1 month’ might evaluate to 2003-07-01, because 2003-06-31 is an invalid date. To determine the previous month more reliably, you can ask for the month before the 15th of the current month. For example: $ date -R Thu, 31 Jul 2003 13:02:39 -0700 $ date --date='-1 month' +'Last month was %B?' Last month was July? $ date --date="$(date +%Y-%m-15) -1 month" +'Last month was %B!' Last month was June! This exactly covers the initial bug report. Because March 29, 2021 minus 1 month results in the invalid date February 29, 2021 which not being a leap year does not exist. What _should_ be the result if the date one month ago does not exist? And the answer to that will mostly depend upon what purpose the question was being asked. When dealing with time in months it also depends upon what you are needing done. If it is the 7th of the month and you want to generate a date that is also the 7th but one month later or earlier then if it is March 7th and generate February 7th then that will be fewer days difference than if it is were June 7th and deciding May 7th is the month early. Due to the nature of having a different number of days in different months. But if that was what I wanted then I would determine what was the month prior and generate a new datestamp using the current day of the month. [[Aside: Off the top of my head and hopefully without a trivial bug. I welcome corrections if I made a mistake in this. But this is still not completely general purpose. $ date "+%F %T" 2021-04-04 20:50:19 $ date "+%Y-$(date --date="$(date +%Y-%m-15) -1 month" +%m)-%d %H:%M:%S" 2021-03-04 20:50:54 *HOWEVER* that still does not handle the case of the original poster's report about what happens on March 29, 2021 minus one month? It can't be February 29th! Isn't that the same as March 1st? ]] Perhaps instead of the code using 30 day months it should use the number of days in the current month? Then on March 31, 2021 -1 month since March has 31 days that would calculate February 28, 2021. Is that better or worse? $ date --date="2021-03-31 12:00 + -31 days" "+%F %T" 2021-02-28 05:00:00 Potentially worse! What happens on March 1, 2021 then? $ date --date="2021-03-01 12:00 + -31 days" "+%F %T" 2021-01-29 05:00:00 In that case we skip over February entirely! Chris Elvidge wrote: > Pádraig Brady wrote: > > The current FAQ (linked below) suggests the workaround of: > > > >date --date="$(date +%Y-%m-15) -1 month" +'Last month was %B.' > > > > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-date-command-is-not-working-right_002e > > It's noticeable that (on my system, CYGWIN-NT and/or Raspbian) 'date -d"now > -1month"' gives a definitely wrong answer, but 'dateadd now -1mo' gives a > somewhat more reasonable answer. dateadd is from the dateutils package, > sometimes dadd and/or dateutils.dadd. > > $ date +"%Y-%m-%d %H:%M:%S" > 2021-03-30 10:37:00 > > $ date -d"now -1 month" +"%Y-%m-%d %H:%M:%S" > 2021-03-02 09:37:17 So... Here is the problem with "now". Using "now" is problematic *some* of the time. Not all of the time. Never when you are trying it on the command line in the middle of the day. But there are windows of time around DST time changes when it is problematic. If you are getting the time now and it is the middle of the day say around noon then that is far away from time changes. But almost every seasonal time change there is a bug report from someone who has an automated process that ran right at the same time as time change giving them a surprising result and they are filing a bug that it gave them the wrong answer, because there was no 2am that day, or maybe there were two 2ams that day, or something. That's why it is better to test for days using noon as a reference. And why when checking for months it is better to test for months away from the change of month. Really the 10th or the 20th would be as good as the 15th but the 15th is in the middle of every month and why it ended up getting into the FAQ recommendation. > $ dateadd now -1mo -f"%Y-%m-%d %H:%M:%S" > 2021-02-28 09:37:27 I don't know anything about dateadd and it is not part of GNU Coreutils. Bob
Re: differece between mkfifo and mknod ... p
Peng Yu wrote: > By discouraging people from using it for a long period (say 10 years), > its support can be dropped eventually which will reduce future > maintenance costs of this duplicate code. Removing it would needlessly break old scripts. It used to be the only way to create named pipes. It should continue for backward compatibility indefinitely. There is no reason not to do so. Bob
bug#47353: Numbered backups also need kept-new-versions else will grow out of control
tag 47353 + notabug close 47353 thanks Dan Jacobson wrote: > Or (info "(coreutils) Backup options") should "admit" that "Numbered > backups need to be trimmed occasionally by the user, lest the fill up > the disk." If the user has asked for them then any decision of the disposition of them is up to the user. If the user fills up their storage with them then surely the user who created them will know what they did and will be in the best position to decide what to do. This type of thing is really both too general to document in detail and too specific to document in detail at the same time. It targets a very specific thing, filling up the disk, with a very general purpose action, copying files. Both of which are plain actions not hidden or subtle. Consuming storage space by making copies is the primary purpose of the cp command. > And also mention in the manual that e.g., emacs has methods to trim > these automatically, but coreutils hasn't implemented them yet. Although cp, mv, and ln, may have used the same format as emacs for the creation of backup files that does not mean that they *are* emacs or that emacs is the preferred editor for users of cp and mv or that knowledge of emacs is needed to use them. I use Emacs and find it a superior editor for creating customized domain specific editors. But I don't think it should be referenced from cp because the Emacs documentation is *HUGELY* more complicated. If a new user is reading documentation on how to use cp then being directed to climb the learning curve of Emacs would be way too much to ask! There is a user who I think would file a bug that it is too much to ask if it were done that way. The better thing to mention in relation to cp would be rm as those would be natural siblings. But they are actually siblings already. So there seems no further need to cross-reference them additionally redundantly again redundantly. I am marking the ticket as closed as there seems nothing to actually do here. But as always more discussion is welcome and if it is determined that something should be done then the ticket may be opened again to track it. Bob
Re: Support for CSV file format on sort
✓ Paul Courbis de Bridiers de Villemor wrote: > Actually I’d prefer to have reliable tools to convert csv to tsv and tsv to > csv, replacing tabs and newlines by \t \n and to be able to use all > standard tools I would find this approach the better direction, more desirable, and more flexible too. Bob
Re: differece between mkfifo and mknod ... p
Peng Yu wrote: > It seems that both `mkfifo` and `mknod ... p` can create a fifo. What > is the difference between them? Thanks. The mknod utility existed "for a decade" in Unix (don't quote me on that vague time statement) before mkfifo existed. The mknod utility existed in Unix v7 as a thin wrapper around the mknod(2) system call. man 2 mknod https://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html Named pipes are special files and special files are created with mknod. At least that was true until mkfifo came along. mkfifo was standardized by POSIX while the mknod utility seems too OS specific and never made it into the standards as far as I know. Therefore "mkfifo" should be used for standards compliance and "mknod" should continue to exist for backwards compatibility. Bob
bug#45358: bootstrap fails due to a certificate mismatch
Erik Auerswald wrote: > Grigoriy Sokolik wrote: > > I've rechecked: > > I cannot reproduce the problem, the certificate is trusted by my system: > > # via IPv4 > $ gnutls-cli --verbose translationproject.org 'Connecting|Status' > Connecting to '80.69.83.146:443'... > - Status: The certificate is trusted. > # via IPv6 > $ gnutls-cli --verbose translationproject.org 'Connecting|Status' > Connecting to '2a01:7c8:c037:6::20:443'... > - Status: The certificate is trusted. I have the same results here. Everything looks okay in the inspection of it. > It seems to me as if your system does not trust the used root CA. > > > [...]issuer `CN=DST Root CA X3,O=Digital Signature Trust Co.'[...] > > On my Ubuntu 18.04 system, I find it via symlink from /etc/ssl/certs: > > $ ls /etc/ssl/certs/DST_Root_CA_X3.pem -l > lrwxrwxrwx 1 root root 53 Mai 28 2018 /etc/ssl/certs/DST_Root_CA_X3.pem > -> /usr/share/ca-certificates/mozilla/DST_Root_CA_X3.crt > $ certtool --certificate-info < > /usr/share/ca-certificates/mozilla/DST_Root_CA_X3.crt | grep Subject: > Subject: CN=DST Root CA X3,O=Digital Signature Trust Co. Again same here on my Debian system. The root certificate store for the trust anchor is in the ca-certificates package. Looking at my oldest system I see this is distributed as package version 20200601~deb9u1 and includes the above file. $ apt-cache policy ca-certificates ca-certificates: Installed: 20200601~deb9u1 Candidate: 20200601~deb9u1 Version table: *** 20200601~deb9u1 500 500 http://ftp.us.debian.org/debian stretch/main amd64 Packages 500 http://ftp.us.debian.org/debian stretch-updates/main amd64 Packages 100 /var/lib/dpkg/status Verifying that the equivalent of ca-certificates is installed on your system should provide for it. As this seems not to be a bug in Coreutils I am marking the bug as closed with this mail. However more discussion is always welcome. Bob
bug#45358: bootstrap fails due to a certificate mismatch
Is this problem still a problem? Perhaps it has been fixed in the time this has been under discussion? Because it looks okay to me. Grigoriy Sokolik wrote: >$ curl -v https://translationproject.org/latest/coreutils/ -o /dev/null ... >* Connected to translationproject.org (80.69.83.146) port 443 (#0) ... >* successfully set certificate verify locations: >* CAfile: /etc/ssl/certs/ca-certificates.crt >* CApath: none I suspect this last line to be the root cause of the problem. There is no CApath and therefore no root anchoring certificates trusted. Without that I don't see how any certificates can be trusted. I do the same test here and see this. $ curl -v https://translationproject.org/latest/coreutils/ -o /dev/null ... * Connected to translationproject.org (80.69.83.146) port 443 (#0) ... * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs Note the inclusion of the trusted root path. * Server certificate: * subject: CN=stats.vrijschrift.org * start date: Mar 1 10:34:36 2021 GMT * expire date: May 30 10:34:36 2021 GMT * subjectAltName: host "translationproject.org" matched cert's * "translationproject.org" * issuer: C=US; O=Let's Encrypt; CN=R3 * SSL certificate verify ok. Note that the certificate validates as okay. Also if I simply ask openssl to validate: $ openssl s_client -connect translationproject.org:443 -CApath /etc/ssl/certs -showcerts /dev/null ... Verify return code: 0 (ok) If I download all of the certificates and validate using certtool, since you mentioned certtool I will use your example: $ openssl s_client -connect translationproject.org:443 -CApath /etc/ssl/certs -showcerts /dev/null | sed -n '/^-BEGIN CERTIFICATE-/,/^-END CERTIFICATE-/p' > /tmp/translationproject.org.certs $ certtool --verbose --verify-profile=high --verify --infile=/tmp/translationproject.org.certs Loaded system trust (127 CAs available) Subject: CN=R3,O=Let's Encrypt,C=US Issuer: CN=DST Root CA X3,O=Digital Signature Trust Co. Checked against: CN=DST Root CA X3,O=Digital Signature Trust Co. Signature algorithm: RSA-SHA256 Output: Verified. The certificate is trusted. Subject: CN=stats.vrijschrift.org Issuer: CN=R3,O=Let's Encrypt,C=US Checked against: CN=R3,O=Let's Encrypt,C=US Signature algorithm: RSA-SHA256 Output: Verified. The certificate is trusted. Chain verification output: Verified. The certificate is trusted. Then it again validates okay. I note that the certificate is current as of now and just recently renewed. It's fresh. $ openssl s_client -connect translationproject.org:443 -CApath /etc/ssl/certs -showcerts /dev/null | sed -n '/^-BEGIN CERTIFICATE-/,/^-END CERTIFICATE-/p;/^-END CERTIFICATE-/q' | openssl x509 -noout -dates notBefore=Mar 1 10:34:36 2021 GMT notAfter=May 30 10:34:36 2021 GMT Therefore I think everything is okay as far as I can tell from the above. Perhaps something about the site has changed to resolve a problem since then? Perhaps an intermediate certificate was added? Bob
bug#45182: mktemp not created other permissions
close 45182 tag 45182 + notabug thanks Vasanth M.Vasanth wrote: > When I create a temp file from root users using mktemp command, then it is > not able to access other users. If the same do in other users then the > group and user came respectively. I see no difference in behavior of GNU Coreutils mktemp when used as a root user or as a non-root user. # mktemp /tmp/tmp.7smatw2ZW5 # ls -ld /tmp/tmp.7smatw2ZW5 -rw--- 1 root root 0 Mar 8 21:56 /tmp/tmp.7smatw2ZW5 $ mktemp /tmp/tmp.nnyNVef0wB $ ls -ld /tmp/tmp.nnyNVef0wB -rw--- 1 rwp rwp 0 Mar 8 21:54 /tmp/tmp.nnyNVef0wB Therefore I am at a loss to understand the report that there are differences. Also the purpose and intent of mktemp is to create files that are accessible by the creating user only and not by other users and not by other groups. This is documented in the manual as this following. When creating a file, the resulting file has read and write permissions for the current user, but no permissions for the group or others; these permissions are reduced if the current umask is more restrictive. Therefore if I read your question about permissions correctly, yes this is documented and intended behavior. > Is this default behaviour or any flags available? No. The files created will always be such that the current user has read and write permissions but no permissions for group or others. Regarding users and groups however. The default permission for non-root, non-priviledged users in most modern operating systems is that non-priviledged users cannot chown files. That is a kernel level restriction and not a restriction of GNU Coreutils. If the OS allows it then chown will do it. If the OS does not allow it then it is the kernel that is restricting it. The root superuser however always has full permission for chown actions. If you desire less strict permissions then this may easily be accomplished by chmod'ing the file afterward. Such as this example. tmpfile=$(mktemp) || exit 1 chmod g+w "$tmpfile" And for a root user setting up a file or directory for another process then the root user may chown and chgrp the file too. tmpfile=$(mktemp) || exit 1 chmod g+w "$tmpfile" chgrp somesharedgroup "$tmpfile" This ordering is important. Because a file that is created securely may be relaxed. But a file created with relaxed permissions may never safely made securely restricted. Therefore the files must be strict from the start and only relaxed if that is the desire. Thank you for your bug report. However as the command is operating as intended and documented I am going to close this bug ticket. But please if there is additional information feel free to add it to the ticket. It will be read and if there is a reason then the ticket will be opened again. Bob
bug#45695: Date does not work for dates before 1970
zed991 wrote: > On linux, I can use date +%s --date "31 Dec 1969" > The result is -9 > A negative number Which is correct for dates before the 0 time: Thu, 01 Jan 1970 00:00:00 + https://en.wikipedia.org/wiki/Unix_time > But when I try it on Windows (using GNUWin32) it gives me an error - > "invalid date" > > I downloaded date for windows from this link - > http://gnuwin32.sourceforge.net/packages/coreutils.htm > > Is there any fix for Windows? According to that page the last update of the GnuWin project was 2015-05-20 therefore one might think that project is no longer updating now more than five years later. Perhaps it would be good to look for a different MS-Windows port of the software? The usual recommendation is to install Cygwin which generally is a more reliable port of the software. Although I understand that it might be a little heavy for many users. But whichever port to Microsoft you find look to see that it has been updated in the last few years. Generally the GNU Project is all about the source and use on Free(dom) Software systems. Generally most of us are not using Microsoft and therefore it makes it hard for us to help. It really needs a Microsoft person to champion the cause and to keep that system updated. Since this is not a bug in the GNU Coreutils software itself but in the windows port of it I am going to go ahead and close the ticket with this message. But if you have updates about this please send an update to the bug ticket as it would help us know what to say in the future to other Microsoft users. And other people searching the archive will benefit from your experience with it. Bob
bug#43828: invalid date converting from UTC, near DST
Martin Fido wrote: > I have tzdata version 2020a: > > $ apt-cache policy tzdata > tzdata: > Installed: 2020a-0ubuntu0.16.04 > Candidate: 2020a-0ubuntu0.16.04 > ... > > $ zdump -v Australia/Sydney | grep 2020 > Australia/Sydney Sat Apr 4 15:59:59 2020 UT = Sun Apr 5 02:59:59 2020 > AEDT isdst=1 gmtoff=39600 > Australia/Sydney Sat Apr 4 16:00:00 2020 UT = Sun Apr 5 02:00:00 2020 > AEST isdst=0 gmtoff=36000 > Australia/Sydney Sat Oct 3 15:59:59 2020 UT = Sun Oct 4 01:59:59 2020 > AEST isdst=0 gmtoff=36000 > Australia/Sydney Sat Oct 3 16:00:00 2020 UT = Sun Oct 4 03:00:00 2020 > AEDT isdst=1 gmtoff=39600 I see this is Ubuntu 16.04. I found a 16.04 system and I was able to recreate this exact problem there. However trying this on an 18.04 system and it is no longer an invalid date. Bob
bug#43828: invalid date converting from UTC, near DST
Martin Fido wrote: > I seem to have found a bug in the date utility, converting from UTC > to Sydney time. It returns invalid date for what should be perfectly > valid: > > $ TZ='Australia/Sydney' date -d '2020-10-04T02:00:00Z' > date: invalid date ‘2020-10-04T02:00:00Z’ > > $ TZ='Australia/Sydney' date -d '2020-10-04T02:59:59Z' > date: invalid date ‘2020-10-04T02:59:59Z’ This is more likely to be in the tzdata zoneinfo database rather than in date itself. Could you please report what version of tzdata you have on your system? Current on my system is tzdata version 2020b-1. And also this information too. $ zdump -v Australia/Sydney | grep 2020 Australia/Sydney Sat Apr 4 15:59:59 2020 UT = Sun Apr 5 02:59:59 2020 AEDT isdst=1 gmtoff=39600 Australia/Sydney Sat Apr 4 16:00:00 2020 UT = Sun Apr 5 02:00:00 2020 AEST isdst=0 gmtoff=36000 Australia/Sydney Sat Oct 3 15:59:59 2020 UT = Sun Oct 4 01:59:59 2020 AEST isdst=0 gmtoff=36000 Australia/Sydney Sat Oct 3 16:00:00 2020 UT = Sun Oct 4 03:00:00 2020 AEDT isdst=1 gmtoff=39600 > Note DST in Sydney changed 10 hours earlier: > > $ TZ='Australia/Sydney' date -d '2020-10-03T15:59:59Z' > Sunday 4 October 01:59:59 AEST 2020 > > $ TZ='Australia/Sydney' date -d '2020-10-03T16:00:00Z' > Sunday 4 October 03:00:00 AEDT 2020 Yes. And I think that is suspicious. Hopefully the zdump information will show that database is in need of an update and that is the root of the problem. I suspect that DST was moved at some point in time. > I have version 8.25: > > $ date --version > date (GNU coreutils) 8.25 I tried this on 8.13, 8.23, 8.26, and 8.32 and was unable to reproduce the problem on any of those versions of date. But I suspect the root cause is in the tzdata zoneinfo database. Bob
bug#43657: rm does not delete files
close 43657 thanks Paul Eggert wrote: > On 9/27/20 8:58 PM, Amit Rao wrote: > > There's a limit? My first attempt didn't use a wildcard; i attempted to > > delete a directory. > > 'rm dir' fails because 'rm' by default leaves directories alone. > > > My second attempt was rm -rf dir/* > > If "dir" has too many files that will fail due to shell limitations that > have nothing to do with Coreutils. Use 'rm -rf dir' instead. The only reason I can guess that rm -rf dir/* might fail would be argument list too long. Which has an FAQ entry. I feel confident this was the problem you experienced. https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Argument-list-too-long In any case in order to establish the background it is necessary to post command that you used exactly and then also the error message that resulted. Without that information exactly it is not possible to establish the root cause of the behavior, if it is a bug, or if it is kernel behavior. Also if rm were to fail then extremely useful would be strace information so that we could see the exact reason for the failure. If this is the ARG_MAX limitation then it does not need rm to reproduce the issue. One can use any command. Using echo should be safe enough. echo dir/* >/dev/null In any case the suggested strategy of using "rm -rf dir" is very good and very simple here. It avoids that problem entirely. Because I feel very confident that the issue is the kernel limitation of ARG_MAX I am going to close this ticket. However if you have further information please reply and add it to the ticket. It can always be opened again if further information points to a bug to be tracked. Bob
bug#43162: chgrp clears setgid even when group is not changed
Paul Eggert wrote: > Karl Berry wrote: > > I was on centos7. > > > > (I don't observe your problem on my Fedora 31 box, for example). > > > > Maybe there is hope for a future centos, then. Just another few data points... I was able to recreate this issue on a CentOS 7 system running in a tmpfs filesystem. So that's pretty much pointing directly at the Linux kernel behavior independent of file system type. Meanwhile... I can also recreate this on a Debian system with a Linux 4.9 kernel in 9 Stretch. But not on 10 Buster Linux 4.19. But once again not on an earlier Linux 3.2 kernel. 3.2 good, 4.9 bad, 4.19 good. Therefore this seems to be a Linux behavior that was the desired way, then flipped to the annoying behavior way, then has flipped back again later. Apparently. Anyway just a few data points. Bob
bug#43541: minor bug in GNU coreutils 8.30: pwd --version doesn't work
tag 43541 + notabug close 43541 thanks Nikolay wrote: > GNU coreutils 8.30 Coreutils version 8.30. Gotcha. > $ pwd --version > bash: pwd: --: invalid option > pwd: usage: pwd [-LP] But that is not the GNU Coreutils pwd program. That is the shell builtin pwd. In this case it is bash. And bash does not document either a --version or --help option. $ type pwd pwd is a shell builtin $ help pwd pwd: pwd [-LP] Print the name of the current working directory. Options: -Lprint the value of $PWD if it names the current working directory -Pprint the physical directory, without any symbolic links By default, `pwd' behaves as if `-L' were specified. Exit Status: Returns 0 unless an invalid option is given or the current directory cannot be read. Since this isn't a coreutils program I am going to attend to the housekeeping and close the bug ticket. But please let's continue discussion here for additional questions or comments. This is actually an FAQ. https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#I-am-having-a-problem-with-kill-nice-pwd-sleep-or-test_002e > $ man pwd > > ... > > --version > output version information and exit That is the man page for Coreutils pwd. And if you want to use the external command then you must avoid the builtin. $ type -a pwd pwd is a shell builtin pwd is /bin/pwd $ env pwd --version pwd (GNU coreutils) 8.32 Use of 'env' in this way forces searching PATH for the named program regardless of shell and avoids builtins. Hope this helps! :-) Bob
bug#42440: bug with rm
tags 42440 + notabug thanks wrote: > sometimes,rm can't delete the file. > but when using rm -rf + file . > the file can be deleted. This does not sound like a bug in the rm command. Therefore I am tagging this as such. If you have follow up information and this turns out to be an actual bug then we can reopen the bug report. Unfortunately there is not enough information in the report to know exactly the case that you are talking about. For example I don't know if you are talking about a literal "+" in that line or not. I will assume that you are since it is there. There are several FAQs listed for rm. Any of these might be a problem. https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#How-do-I-remove-files-that-start-with-a-dash_003f https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Why-doesn_0027t-rm-_002dr-_002a_002epattern-recurse-like-it-should_003f You might have experienced either of those problems. Or a different problem. We can't tell. > sometimes,rm can't delete the file. There are two main cases. One is that if the file is not writable by the user then 'rm' will check for this and ask the user for confirmation. rwp@angst:/tmp/junk$ touch file1 rwp@angst:/tmp/junk$ chmod a-w file1 rwp@angst:/tmp/junk$ rm file1 rm: remove write-protected regular empty file 'file1'? n rwp@angst:/tmp/junk$ ls -l file1 -r--r--r-- 1 bob bob 0 Jul 21 23:52 file1 The -f option will force it without prompting. rwp@angst:/tmp/junk$ rm -f file1 rwp@angst:/tmp/junk$ ls -l file1 ls: cannot access 'file1': No such file or directory This is a courtesy confirmation. Because the permissions on the file is not important when it comes to removing a directory entry. A file is really just an entry in the directory containing it. Removing a file simply removes the entry from the directory. When the last link to the file reaches zero then the file system reclaims the storage. The file system is a "garbage collection" system using reference counting. https://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Therefore the only permission needed to remove a file is write permission to the directory containing it. rwp@angst:/tmp/junk$ touch file2 rwp@angst:/tmp/junk$ ls -ld . file2 drwxrwxr-x 3 rwp rwp 100 Jul 21 23:56 ./ -rw-rw-r-- 1 rwp rwp 0 Jul 21 23:56 file2 rwp@angst:/tmp/junk$ chmod a-w . rwp@angst:/tmp/junk$ ls -ld . file2 dr-xr-xr-x 3 rwp rwp 100 Jul 21 23:56 ./ -rw-rw-r-- 1 rwp rwp 0 Jul 21 23:56 file2 This creates a file. The file is writable. But I have changed the directory containing it not to be writable. This prevents the ability to remove the file. Can't remove it because the directory is not wriable. rwp@angst:/tmp/junk$ rm file2 rm: cannot remove 'file2': Permission denied rwp@angst:/tmp/junk$ rm -f file2 rm: cannot remove 'file2': Permission denied rwp@angst:/tmp/junk$ rm -rf file2 rm: cannot remove 'file2': Permission denied rwp@angst:/tmp/junk$ ls -ld . file2 dr-xr-xr-x 3 rwp rwp 100 Jul 21 23:56 ./ -rw-rw-r-- 1 rwp rwp 0 Jul 21 23:56 file2 In order to remove the file we must have write permission to the directory. Adding write permission to the directory allows removing the file. rwp@angst:/tmp/junk$ chmod ug+w . rwp@angst:/tmp/junk$ rm file2 rwp@angst:/tmp/junk$ ls -ld file2 ls: cannot access 'file2': No such file or directory Expanding upon this problem is if there are many directories deep and the directories are not writable. rwp@angst:/tmp/junk$ mkdir -p dir1 dir1/dir2 dir1/dir2/dir3 rwp@angst:/tmp/junk$ touch dir1/dir2/dir3/file3 rwp@angst:/tmp/junk$ chmod -R a-w dir1 rwp@angst:/tmp/junk$ find dir1 -ls 69649132 0 dr-xr-xr-x 3 rwp rwp60 Jul 22 00:00 dir1 69649133 0 dr-xr-xr-x 3 rwp rwp60 Jul 22 00:00 dir1/dir2 69649134 0 dr-xr-xr-x 2 rwp rwp60 Jul 22 00:00 dir1/dir2/dir3 69650655 0 -r--r--r-- 1 rwp rwp 0 Jul 22 00:00 dir1/dir2/dir3/file3 That sets up the test case. None of the directories are wriable. Therefore we cannot remove any of them. The directory holding the entries must be writable. rwp@angst:/tmp/junk$ rm -rf dir1 rm: cannot remove 'dir1/dir2/dir3/file3': Permission denied Even using 'rm -rf' does not work. And should not work. Because the directories are not writable. In order to remove these files the directories must be made writable. rwp@angst:/tmp/junk$ chmod -R u+w dir1 rwp@angst:/tmp/junk$ rm -rf dir1 rwp@angst:/tmp/junk$ ls -ld dir1 ls: cannot access 'dir1': No such file or directory Hopefully this helps you understand how directory entries work, that the directory holding an entry (either file or another directory) must be writable. How to add write permission. How to remove a single file. How to remove a directory
Re: Enhancement Request for sha256sum - output only the SHA-256 hash alone
Pádraig Brady wrote: > jens wrote: > > It would make shell scripts that use sha256sum much simpler. Currently it > > is necessary to split the output of sha256sum to obtain the hash, which > > usually requires an additional command / Unix process. > > This is one of those trade-offs. > I'd be 60:40 against adding such an option, > because it's so easy to implement with cut(1): I feel that way too. It's just so easy to do in the shell. > sum=$(sha256sum file | cut -d ' ' -f 1) > > Yes that's an extra process, but you can easily > enough avoid that on any POSIX shell using: > > sum=$(sha256sum file) && sum=${sum%% *} I'll suggest always using stdin instead ("sha256sum < file") as that avoids any possible quoting of things to get in the way. In the case where the filename contains special characters. And then just to show yet a different way to keep from using external processes. set -- $(sha256sum < file) && sum=$1 But this does assume one is no longer using the positional arguments at that point. Bob
Re: mv w/mkdir -p of destination
Vito Caputo wrote: > Does this already exist? > > Was just moving a .tgz into a deep path and realized I hadn't created > it on that host, and lamented not knowing what convenient flag I could > toss on the end of the typed command to make `mv` do the mkdir -p > first for me. I suggest using the command line shell for this. For example: $ mv foo.tar.gz /home/rwp/one/two/three/four/five/six/seven/eight/nine/ mv: cannot move 'foo.tar.gz' to '/home/rwp/one/two/three/four/five/six/seven/eight/nine/': No such file or directory $ mkdir -p ESC . $ Control-P Control-P Enter Some users probably use ESC _ instead of ESC . as both are the same. But ESC _ works in vi mode too. Also $_ works but then it is not WYSIWYG. So I suggest vi mode users should use ESC _ for this. $ mv foo.tar.gz /home/rwp/one/two/three/four/five/six/seven/eight/nine/ mv: cannot move 'foo.tar.gz' to '/home/rwp/one/two/three/four/five/six/seven/eight/nine/': No such file or directory $ mkdir -p ESC _ $ ESC k ESC k Enter This feature is so easy to use on the command line that I can't see a need to add the feature to mv. Using the command line shell is already there and the ability to use it helps with all of the commands and not just this one very specific deep dig thing into mv. This command line shell feature dates back to ksh. Bob
bug#42034: option to truncate at end or should that be default?
L A Walsh wrote: > I allocated a large file of contiguous space (~3.6T), the size of a disk > image I was going to copy into it with 'dd'. I have the disk image > 'overwrite' the existing file, in place ... It's possible that you might want to be rescuing data from a failing disk or doing other surgery upon it. Therefore I want to mention ddrescue here. https://www.gnu.org/software/ddrescue/ Of course it all depends upon the use case but ddrescue is a good tool to have in the toolbox. It might be just the right tool. Take for example a RAID1 image on two failing drives that should be identical but both are reporting errors. If the failures do not overlap then ddrescue can be used to merge the successful reads from those two images producing one fully correct image. Bob
bug#41792: Acknowledgement (dd function – message: "No boot sector on USB device"")
close 41792 thanks Since the discussion has moved away from anything GNU Coreutils related and doesn't seem to be reporting any bugs in any of the utilities I am going to close the bug ticket. But discussion may continue here regardless. If we see a dd bug we can re-open the ticket. Ricky Tigg wrote: > The difference of device path is due to the fact that the USB media was > plugged out after the write-operation was achieved on the Linux computer > then plugged into a computer –Asus– whose Windows OS has to be restored, > then plugged back to the same computer but to a *different* USB port. It's > safe to open the present issue-ticket. Hmm... There is no reason that the Linux kernel would renumber the device simply because it was removed and inserted again. Therefore me thinks that it was not cleanly removed. Me thinks that something in the system had mounted it keeping it busy preventing it from cleanly being ejected. This "something" may have been an automatic mounting of it as many Desktop Environments unfortunately default to doing. IMNHO automated mounting is a bad idea and should never be enabled by default. > *Source media*: > https://www.microsoft.com/en-us/software-download/windows10ISO The source media doesn't matter to GNU utilities. The 'dd' utility treats files as raw bytes and does not treat MS-Windows-10 ISO images any differently than any other raw data. It might be that or pictures of your dog or random cosmic noise recorded from your radio. It doesn't matter. It's just data. Your Desktop Environment may take action however. It is possible that your DE will probe the device, detect that it is an ISO image, and automatically mount that ISO image. That's bad. But that's your Desktop Environment and unrelated to 'dd'. But it always been a bad idea. Regardless of how many people do it. > *Rufus v4.1.4* – I couldn't use it since The Windows OS installed is > missing some system's files. Will convert it to fit on Fedora at release of > version 33 which will update the uniformly mingw component and thus > mingw64-headers which is old and is the cause of a known issue. > > I wrote the disc image as well using those tools then booted the USB device > having the disc image written on.: > > *Fedora Media Writer v4.1.4* – Officially does not support Microsoft > Windows disc images. I did not know that before writing. My first thought was, huh? Why would Fedora Media Writer not treat files as raw files? My second thought was that the question was for a Fedora Media Writer mailing list as this bug ticket is not the place to be discussing other random projects. > *Unetbootin v677* – It writes partially the disc image thus the installer > is operational partially. Issue was already reported by someone on Git. > > *Woeusb v3.3.1* – Installer is operational on BIOS but not on EFI systems. > Issue was already reported by someone on Git. > > *Balena Etcher v1.5.9*8 x64 as AppImage format – The device is not listed > at boot. Gosh. Reading your report makes MS-Windows seem like such a terrible system! I read about all of your pain of working on it. You have tried all of these tools and nothing is working for you. It is reading these types of reports that I am thankful I am working on a Free(dom) Software operating system where things Just Work! Meanwhile... Let's get back to your information about 'dd'. > $ file -b Win10_2004_Finnish_x64.iso > ISO 9660 CD-ROM filesystem data 'CCCOMA_X64FRE_FI-FI_DV9' (bootable) That looks like you were successfully able to write the ISO image to the device. Looks okay. > *Component*: coreutils.x86_64 8.32-4.fc32.; *OS*: Linux Fedora Good. > Source of file: > https://www.microsoft.com/en-us/software-download/windows10ISO > > Disc image file > - checked against its SHA-256 checksum was correct > - written successfully with that command: > # dd if=Win10_2004_Finnish_x64.iso of=/dev/sdc bs=4M oflag=direct > status=progress && sync I don't see any error messages. That's good. The oflag=direct should use direct I/O. Which means that the 'sync' shouldn't matter since there should be no file system buffer to flush. It will simply flush other unrelated buffers. Won't hurt though. The bs size seems very small at 4M to me. Especially for use with a NAND flash USB storage device. I would select a much larger size. I would probably use 64M which is likely to be an integral size of your original ISO image but that should be verified. > Once written, the partition is as follows: > $ mount | fgrep /run/media/$USER > /dev/sdb on /run/media/yk/CCCOMA_X64FRE_FI-FI_DV9 type udf > (ro,nosuid,nodev,relatime,uid=1000,gid=1000,iocharset=utf8,uhelper=udisks2) WHY is this mounted? That seems like a problem. You said that the device was removed and replaced and went from sdc to sdb?! Probably because it was mounted. This feels like the root cause of all of your problems. It feels to me that something is automatically mounting the device. That's
bug#41657: md5sum: odd escaping for input filename \
close 41657 thanks No one else has commented therefore I am closing the bug ticket. But the discussion may continue here. Michael Coleman wrote: > Thanks very much for your prompt reply. Certainly, if this is > documented behavior, it's not a bug. I would have never thought to > check the documentation as the behavior seems so strange. I am not always so generous about documented behavior *never* being a bug. :-) > If I understand correctly, the leading backslash in the first field > is an indication that the second field is escaped. (The first field > never needs escapes, as far as I can see.) Right. But it was available to clue in the md5sum and others that the file name was an "unsafe" file name and was going to be escaped there. > Not sure I would have chosen this, but it can't really be changed > now. But, I suspect that almost no real shell script would deal > with this escaping correctly. Really, I'd be surprised if there > were even one example. If so, perhaps it could be changed without > trouble. Let's talk about the shell scripting part. Why would this ever need to be parsed in a shell script? And if so then that is precisely where it would need to be done due to the file name! Your own example was a file name that consisted of a single backslash. Since the backslash is the shell escape character then handling that in a shell script would require escaping it properly with a second backslash. I will suggest that the primary use for the *sum utility output is as input to the same utility later to check the content for differences. That's arguably the primary use of it. There are also cases where we will want to use the *sum utilities on a single file. That's fine. I think the problematic case here might be a usage like this usage. filename="\\" sum=$(md5sum "$filename" | awk '{print$1}') printf "%s\n" "$sum" \d41d8cd98f00b204e9800998ecf8427e And then there is that extra backslash at the start of the hash. Well, yes, that is unfortunate. But in this case we already have the filename in a variable and don't want the filename from md5sum. This is very similar to portability problems between different versions of 'wc' and other utilities too. (Some 'wc' utils print leading spaces and some do not.) As you already deduced if md5sum does not have a file name then it does not know if it is escaped or not. Reading standard input instead doesn't have a name and therefore "-" is used as a placeholder as per the tradition. filename="\\" sum=$(md5sum < "$filename" | awk '{print$1}') printf "%s\n" "$sum" d41d8cd98f00b204e9800998ecf8427e And because this is discussion I will note that the name is just one of the possible names to a file. Let's hard link it to a different name. And of course symbolic links are the same too. A name is just a pointer to a file. ln "$filename" foo md5sum foo d41d8cd98f00b204e9800998ecf8427e foo But I drift... I think it likely you have already educated your people about the problems and the solution was to read from stdin when the file name is potentially untrusted "tainted" data. (Since programming langauges often refer to unknown untrusted data as "tainted" data for the purpose of tracking what actions are safe upon it or not. When taint checking is enabled.) Therefore if the name is unknown then it is safer to avoid the name and use standard input. And I suggest the same with other utilities such as 'wc' too. Fortunately wc is not used to read back its own input. Otherwise I am sure someone would suggest that it would need the same escaping done there too. Example that thankfully does not actually exist: $ wc -l \\ \0 \\ I am sure that if such a change were made that it would result in a large wide spread breakage. Let's hope that never happens. Bob
bug#41657: md5sum: odd escaping for input filename \
Hello Michael, Michael Coleman wrote: > $ true > \\ > $ md5sum \\ > \d41d8cd98f00b204e9800998ecf8427e \\ > $ md5sum < \\ > d41d8cd98f00b204e9800998ecf8427e - Thank you for the extremely good example! It's excellent. > The checksum is not what I would expect, due to the leading > backslash. And in any case, the "\d" has no obvious interpretation. > Really, I can't imagine ever escaping the checksum. As it turns out this is documented behavior. Here is what the manual says: For each FILE, ‘md5sum’ outputs by default, the MD5 checksum, a space, a flag indicating binary or text input mode, and the file name. Binary mode is indicated with ‘*’, text mode with ‘ ’ (space). Binary mode is the default on systems where it’s significant, otherwise text mode is the default. Without ‘--zero’, if FILE contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names. If FILE is omitted or specified as ‘-’, standard input is read. Specifically it is this sentence. Without ‘--zero’, if FILE contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names. And so the program is behaving as expected. Which I am sure you will not be happy about since this bug report about it. Someone will correct me but I think the thinking is that the output of md5sum is most useful when it can be checked with md5sum -c and therefore the filename problem needed to be handled. The trigger for this escapes my memory. But if you were to check the output with -c then you would find this result with your test case. $ md5sum \\ | md5sum -c \: OK And note that this applies to the other *sum programs too. The commands sha224sum, sha256sum, sha384sum and sha512sum compute checksums of various lengths (respectively 224, 256, 384 and 512 bits), collectively known as the SHA-2 hashes. The usage and options of these commands are precisely the same as for md5sum and sha1sum. See md5sum invocation. > (Yes, my users are a clever people.) I am so clever that sometimes I don't understand a single word of what I am saying -- Oscar Wilde :-) Bob
bug#37702: Suggestion for 'df' utility
Paul Eggert wrote: > So I'd prefer having 'df' just do the "right" thing by default, and > to have an option to override that. The "right" thing should be to > ignore all these pseudofilesystems that hardly anybody cares about. +1! Which I thought I would say because often I am a status quo type of person. But this is clearly needed. Hardly a day goes by that I don't hear swearing from people about the current extremely noisy and hard to use df output in the environment of dozens of pseudo file systems. And I don't think this will break legacy and scripted use. Bob
Re: [PATCH 1/2] echo: pacify Oracle Studio 12.6
Paul Eggert wrote: > * src/echo.c (main): Don’t assign pointer to bool. > This is well-defined in C99, but is arguably bad style > and Oracle Studio 12.6 complains. ... > + bool posixly_correct = !!getenv ("POSIXLY_CORRECT"); Of course this is fine. But because char *getenv() returns a pointer it just has this feeling of expressing frustration with the compiler seeing the !! there. It feels like an obvious cast silencing a compiler warning. Perhaps that is the intent? :-) RETURN VALUE The getenv() function returns a pointer to the value in the environment, or NULL if there is no match. Just as a soft comment that isn't very strong if it were me I would use a comparison against a pointer which produced a boolean result and that boolean result used to assign to a bool. It just feels to me like more of the types are more obvious this way. And no cast of any type is either implicit or explicit. bool posixly_correct = getenv ("POSIXLY_CORRECT") != NULL; But of course it's all a matter of style. Bob
Re: Extend uniq to support unsorted list based on hashtable
Yair Lenga wrote: > For the first point, I would note that most coreutils goes well beyond > POSIX. Consider "cp", which has many useful additions beyond the POSIX > features. Most of those additions were due to file systems with new features and therefore cp needed to be able to deal with those features. ACLs and extended attributes and other things. There was no other way to deal with them. (However I don't think I have ever had reason to use the --strip-trailing-slashes option.) > The second point is about availability of other tools to achieve > similar task. This is a "judgement call where this functionality > belong. There is no single right answer here. Such implementation can > be done with few lines of code in any scripting solution If a task can be done with a small combination of utilities then that small combination of utilities is usually the right way to do it. Because otherwise instead of a small set of utilities that work together the result is many very large utilities each of which does everything. > My main point is that given that the very common use case for 'uniq' > is combined with other coreutils functions (sort, cut, sed), it make > sense to have an efficient implementation for "counting unique > values" available within "coreutils", instead of sending the user to > look for a solution elsewhere, or to implement his own. Some years ago a programming challenge involved Donald Knuth and Doug McIlroy and has become somewhat famous. I highly recommend studying this example. Here is an article that discusses the event. http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/ It's a very educational lesson that we might learn from those programming greats! Bob
bug#41554: chmod allows removing x bit on chmod without a force flag, which can be inconvenient to recover from
tag 41554 + notabug close 41554 thanks Will Rosecrans wrote: > Based on an inane interview question that was discussed here on Twitter: > https://twitter.com/QuinnyPig/status/1265286980859908102 It's an interview question. The purpose of this type of question is never a practical existing problem but is instead to create a unique, unusual, and unlikely to have been previously experienced problem for discussion with the candidate. To see how the candidate thinks about problems like this. To see if they give up immediately or if they persevere on. To see if they try to use available resources such as discussing the problem with the interviewer. It's a method to see the candidate's problem solving skills in action. If the candidate says, here is the canonical correct solution, then the interviewer knows that the candidate has seen this question before, the interviewer will have learned nothing about the candidates problem solving skills, and will simply move on to another question continuing to try to assess this. I am not particularly fond of interviewers that fish for a particular answer. Better when the interviewer knows they are looking for an open ended discussion. The goal is assessing the candidate's problem solving ability not rote memorization of test prep questions and answers. It is easy to say, oh, we will simply have the program avoid changing itself, since it that would almost never desirable. But that says that it is sometimes desirable. And though easy to say it is actually very hard to program it to avoid creating new bugs. I might say impossible. If this particular case were to be modified in the program the only results would be that the interviewer would need to look for a different inane, unique, unusual, and unlikely to have been experienced situation to put the candidate in. But along the way the program would have acquired a bit of cruft. It would be an unnatural growth on the program source. It would forever need testing. It adds complexity. It would likely be the source of an actual real world bug. As opposed to this thought-experiment situation. > "chmod a-x $(which chmod)" not a particularly likely thing for a user to > try to do directly, but it is conceivable for some sort of script to > attempt it by accident because of a bug, and it would make the system > inconvenient to recover. Since it's almost never a desirable operation, > chmodding chmod itself could simply fail unless something like --force is > supplied. The underlying safety logic is similar to that behind the > existing "--(no-)preserve-root" There are an infinite number of ways for someone to program a mistake. Trying to enumerate them all in a program to prevent them is one of them. Bob
bug#41518: Bug in od?
Yuan Cao wrote: > > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e > > Thanks for pointing me to this documentation. > > It just feels strange because the order does not reflect the order of the > characters in the file. It feels strange in the environment *today*. But in the 1970's when the 'od' was written it was perfectly natural on the PDP-11 to print out the native machine word in the *native word order* of the PDP-11. During that time most software operated on the native architecture and the idea of being portable to other systems was not yet common. The PDP-11 is a 16-bit word machine. Therefore what you are seeing with the 2-byte integer and the order it is printed is the order that it was printed on the PDP-11 system. And has remained unchanged to the present day. Because it can't change without breaking all historical use. For anyone using od today the best way to use -x is -tx1 which prints bytes in a portable order. Whenever you think to type in -x use -tx1 instead. This avoids breaking historical use and produces the output that you are wanting. > I think it might have been useful to get the "by word" value of the file if > you are working with a binary file historically. One might have stored some > data as a list of shorts. Then, we can easily view the data using "od -x > data_file_name". > > Since memory is so cheap now, people are probably using just using chars > for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints > (shorts) before. In this case, the "by word" order does not seem to me to > be as useful and violates the principle of least astonishment needlessly. But changing the use of options to a command is a hard problem and cannot be done without breaking a lot of use of it. The better way is not to try. The options to head and tail changed an eon ago and yet just in the last week I ran across a posting where the option change bit someone in the usage change. And since there is no need for any breaking change it is better not to do it. Simply use the correct options for what you want. -tx1 in this case. > It might be interesting to change the option to print values by double word > or quadword instead or add another option to let the users choose to print > by double word or quadword if they want. And the size of 16-bits was a good value for a yester-year. 32-bits has been a good size for some years. Now 64-bits is the most common size. The only way to win is not to play. Better to say the size explicitly. And IMNHO the best size is 1 regardless of architecture. od -Ax -tx1z -v Each of those options have been added over the years and each changes the behavior of the program. Each of those would be a breaking change if they were made the default. Best to ask for what you want explicitly. I strongly recommend https://www.ietf.org/rfc/ien/ien137.txt as required reading. Bob
bug#41518: Bug in od?
A little more information. Pádraig Brady wrote: > Yuan Cao wrote: > > I recently came across the following behavior. > > > > When using "--traditional x2" or "-x" option, it seems the order of hex > > code output for the characters is pairwise reversed (if that's the correct > > way of describing it). ‘-x’ Output as hexadecimal two-byte units. Equivalent to ‘-t x2’. Outputs 16-bit integers in the *native byte order* of the machine. Which may be either big-endian or little-endian depending on the machine. Not portable. Depends upon the machine it is run upon. > If you want to hexdump independently of endianess you can: > > od -Ax -tx1z -v The -tx1 option above is portable because it outputs 1-byte units instead of 2-byte units which is independent of endianess. This is the FAQ entry for this topic. https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e Bob
Re: Using coreutils nohup in Windows
Keith Neargarder wrote: > I am trying to execute a Perl script using nohup but I get the > error:nohup: cannot run command 'test.pl': Exec format error Your operating system apparently does not understand #! scripts. I can see that you are running MS-Windows. Obviously the GNU Project is dedicated to free software and that is what most of us are running. Which is likely why no one responded earlier. Sorry. No one had anything to say. Don't know anything about it. > Command I am trying to run is:nohup test.pl > test.log 2>&1 & > I have figured out that it will run if I put "perl" before my script: > nohup perl test.pl > test.log 2>&1 & In this case the nohup command is running perl. And perl is found as perl.exe as you described. You OS knows how to start .exe files and does so. Then test.pl is a program argument to perl.exe. perl.exe then reads, interprets, executes, the program test.pl. > My perl.exe is located at c:\Perl\bin\perl.exe > I have .pl files associated to the perl.exe and can run Perl scripts > at the command prompt simply by typing in the script name without > the "perl" command prefix. I am sorry but I know very little about what Microsoft does here. My expectation is that the OS will stat(2) the file looking to see if the execute bit is set or not. If it is set then the OS will try to exec(2) the file. On scripts this will fail (internally to the kernel now) since it is not machine code and then will read the first few bytes the file looking for a #!/path/to/interpreter/program and if found then it starts the program and the program reads the file. I have no idea how Microsoft handles this type of thing. I rather expect it will need a test.bat wrapper to be executed which will turn around and call perl on it. I have no idea. > I have also tried the shebang at the top of my test.pl script (with > forward and backward slashes):#!\Perl\bin\perl > No matter what I try nohup doesn't want to execute the Perl script > unless I prefix it with the call to the "perl" executable. > I would really like to avoid that if possible. Why doesn't the file > association work with nohup? No idea. Sorry. However for anyone else to know anything they would need to know what port of the utilities you are using. And the version of them. Is this Cygwin? Or MinGW? Or any of many other possibilities. I am guessing that it is not Cygwin because you say it is not working and I feel confident that it would work. nohup --version If you say the environment on Windows that you are using and the version then it is possible that someone else might know something about it. I think it likely that if you installed Cygwin that things would Just Work. Good Luck! Bob
Re: Questions related to stty(TOMITA)
富田黎 wrote: > I want to know the difference between stty command with "speed" and without > "speed". With the "speed" command the stty program prints the speed. Without it there is no printing of the speed. The Coreutils stty manual says: ‘speed’ Print the terminal speed. > When I change the baud rate of serial connected terminal (ttyS0), I > don't know how the presence or absence of speed affects it as shown > below. > > $ stty -F /dev/ttyS0 [baudrate] Sets the speed to "baudrate". > $ stty -F /dev/ttyS0 speed [baudrate] First prints the speed, due to the "speed" argument. Then sets the speed to "baudrate" due to the baudrate argument. For example: rwp@angst:~$ stty speed 38400 The terminal speed is 38400. rwp@angst:~$ stty speed 9600 38400 The terminal speed wass 38400. This was printed. Then it was changed to 9600. rwp@angst:~$ stty speed 9600 The terminal speed is 9600. rwp@angst:~$ stty 38400 speed 38400 rwp@angst:~$ stty 9600 speed 9600 rwp@angst:~$ stty 4800 speed 4800 In the above the speed is set to the value shown and then the printing due to the "speed" argument occurs after it has been set. > I know the official recommends the latter, I am not aware of this recommendation. Can you tell us the documentation you are referring to? The GNU Coreutils uses the in program --help output, which generates the man page, and the full manual is in the online info docs. The full online info docs say exactly this: ‘speed’ Print the terminal speed. ‘N’ Set the input and output speeds to N. N can be one of: 0 50 75 110 134 134.5 150 200 300 600 1200 1800 2400 4800 9600 19200 38400 ‘exta’ ‘extb’. ‘exta’ is the same as 19200; ‘extb’ is the same as 38400. Many systems, including GNU/Linux, support higher speeds. The ‘stty’ command includes support for speeds of 57600, 115200, 230400, 460800, 50, 576000, 921600, 100, 1152000, 150, 200, 250, 300, 350, or 400 where the system supports these. 0 hangs up the line if ‘-clocal’ is set. Note that those are two separate program argument options. Separate. Independent. > but even with the former you could change the baud rate in some > cases and I'd like to know what the difference is. Hopefully this helps to clarify the differences. > For example, I was able to resolve the garbled ttyS0 with the > former, but when I accessed ttyS0 with gpsd, I saw no change in > baudrate. > > Garbled characters > $ cat /dev/ttyS0 > ->garbled text > ->$ stty -F /dev/ttyS0 [baudrate] > ->$ cat /dev/ttyS0 > ->Resolution of garbled characters This shows the problem solved. But there is a problem in the above sequence. Therefore I do not understand how it can be working. Let me repeat it so that I may comment upon each step. $ cat /dev/ttyS0 ->garbled text $ stty -F /dev/ttyS0 [baudrate] This changes the speed to baudrate. $ cat /dev/ttyS0 Resolution of garbled characters Good! However if nothing has the /dev/ttyS0 open then the system will immediately reset the speed back to the system default. Therefore simply doing the above will have no effect due to the lack of anything holding the device open. In order to have the stty change persist there must be a non-zero reference count to the serial device. It should work okay however if there is something holding onto the device. I will also take a moment to say that in the traditional usage stty operates on the stdio file descriptor 0. $ stty 9600 < /dev/ttyS0 Both syntax formats should work the same however. But traditionally the redirection from stdin was used. > About gpsd > $ gpsmon /dev/ttyS0 > ->The default baudrate > -> $ stty -F /dev/ttyS0 [baudrate] > -> $ gpsmon /dev/ttyS0 > ->Keep the default baudrate. > -> $ stty -F /dev/ttyS0 speed [baudrate] > ->$ gpsmon /dev/ttyS0 > ->Confirm the changes to the baurate you set Surely the 'gpsmon' command has the option to set the speed itself? In the manual I see the options to do so. Also it says that it will search for the correct speed itself automatically. Hope this helps to explain things. :-) Bob
bug#41001: mkdir: cannot create directory ‘test’: File exists
taehwan jeoung wrote: > Can this error message be clarified? The directory already exists, it is > not a file. That is incorrect. Directories are files. FIFOs are files. Device nodes are files. Symlinks are files. Network sockets are files. They are all files. Therefore it is not incorrect to say that a file already exists. Directories are files. We have all agreed that if a better error message were provided then that would be an improvement. We agree with you. We would do it if it were within the power of mkdir(1) to do it. But it isn't. Therefore we can't. > lib/mkdir-p.c:200 contains this line of code that triggers below:- > > error (0, mkdir_errno, _("cannot create directory %s"), quote (dir)); > > As it's easy enough to know that the reason mkdir fails is because > 'test' a directory that already exists. That is also incorrect. Since that information is not provided at the time of the action it can only be inferred by implication later. But at the time of the failure return it cannot be known unless the kernel provides that information. Later in time things might have changed. > Easy enough to check with stat() and S_ISDIR(sb.st_mode) Incorrect. Checking *later* with stat() does not provide the reason that the earlier mkdir(2) failed. It provides a guess of something that might be the reason. Maybe. Or it maybe not. Things may have changed later in time and the guess made later might not be the correct reason. Reporting that as if it were would be a worse bug. That checking later in time after the mkdir has failed is what introduces the race condition that we have been talking about. Please do not ignore that critically important point. > Can this be changed? Maybe I can make a patch for it. Sigh. Ignoring the reasons why this is a bad idea are not helpful. Bob
bug#41001: mkdir: cannot create directory ‘test’: File exists
Jonny Grant wrote: > Paul Eggert wrote: > > Jonny Grant wrote: > > > Is a more accurate strerror considered unreliable? > > > > > > Current: > > > mkdir: cannot create directory ‘test’: File exists > > > > > > Proposed: > > > mkdir: cannot create directory ‘test’: Is a directory > > > > I don't understand this comment. As I understand it you're proposing a > > change to > > the mkdir command not a change to the strerror library function, and the > > change > > you're proposing would introduce a race condition to the mkdir command. > > As the mkdir error returned to the shell is the same, I don't feel the > difference between the words "File exists" and "Is a directory" on the > terminal can be considered a race condition. I read the message thread carefully and the proposal was to add an additional non-atomic stat(2) call to the logic. That sets up the race condition. The difference in the words of the error string is not the race condition. The race condition is created when trying to stat(2) the file to see why it failed. That can only be done as a separate action. That cannot be an atomic operation. That can only create a race condition. For the low level utilities it is almost always a bad idea to layer in additional system calls that are not otherwise there. Doing so almost always creates additional bugs. And then there will be new bug reports about those problems. And those will be completely valid. Try this experiment on your own. /tmp$ strace -e trace=mkdir mkdir foodir1 mkdir("foodir1", 0777) = 0 +++ exited with 0 +++ /tmp$ strace -e trace=mkdir mkdir foodir1 mkdir("foodir1", 0777) = -1 EEXIST (File exists) mkdir: cannot create directory ‘foodir1’: File exists +++ exited with 1 +++ The first mkdir("foodir1", 0777) call succeeded. The second mkdir("foodir1", 0777) call fail, returned -1, set errno = EEXIST, EEXIST is the error number for "File exists". Note that this output line: mkdir("foodir1", 0777) = -1 EEXIST (File exists) That line was entirely reported by the 'strace' command and is not any code related to the Coreutils mkdir command. The strace command reported the same "File exists" message as mkdir did later, due to the EEXIST error code. Let's try the same experiment with a file. And also with a pipe and a character device too. /tmp$ touch file1 /tmp$ strace -e trace=mkdir mkdir file1 mkdir("file1", 0777)= -1 EEXIST (File exists) mkdir: cannot create directory ‘file1’: File exists +++ exited with 1 +++ /tmp$ mkfifo fifo1 strace -e trace=mkdir mkdir fifo1 mkdir("fifo1", 0777)= -1 EEXIST (File exists) mkdir: cannot create directory ‘fifo1’: File exists +++ exited with 1 +++ /tmp$ sudo mknod char1 c 5 0 /tmp$ strace -e trace=mkdir mkdir char1 mkdir("char1", 0777)= -1 EEXIST (File exists) mkdir: cannot create directory ‘char1’: File exists +++ exited with 1 +++ And so we see that the kernel is returning the same EEXIST error code for *all* cases where a file previously exists. And it is correct because all of those are files. Because directories are files, pipes are files, and files are files. Everything is a file. Therefore EEXIST is a correct error message. In order to correctly change the message being reported the change should be made in the kernel so that the kernel, which has the information at that time atomically, could report an error providing more detail than simply EEXIST. You have proposed that mkdir add a stat(2) system call to extract this additional information. > as it's easy enough to call stat() like other package maintainers > do, as you can see in binutils. *That* stat() addition creates the race condition. Adding a stat() call cannot be done atomically. It would need to be done either before the mkdir(), after the mkdir(), or both before and after. Let's see how that can go wrong. Let's say we stat(), does not exist, we continue with mkdir(), fails with EEXIST because another process got there first. So then we stat() again and by that time the other process has already finished processing and removed the directory again. A system call trace would look like this. lstat("foodir1", 0x7ffcafc12800) = -1 ENOENT (No such file or directory) mkdir("foodir1", 0777) = -1 EEXIST (File exists) lstat("foodir1", 0x7ffcafc12800) = -1 ENOENT (No such file or directory) Okay. That's confusing. The only value in hand being EEXIST then that is the error to be reported. If this were repeated many times then sometimes we would catch it as an actual directory. lstat("foodir1", 0x7ffcafc12800) = -1 ENOENT (No such file or directory) mkdir("foodir1", 0777) = -1 EEXIST (File exists) lstat("foodir1", {st_mode=S_IFDIR|0775, st_size=40, ...}) = 0 In that case the proposal is to report it as EISDIR. If we were to set up two
bug#40958: date command give current time zone regardless of seconds since epoch requested.
tag 40958 + notabug close 40958 thanks GNAT via GNU coreutils Bug Reports wrote: > I am going to hazard a guess and say this is the expected behaviour, > but I cannot find anything though goog. The FAQ gives the recipe to figure these types of problems out. https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-date-command-is-not-working-right_002e And for the timezone and date in question. zdump -v Europe/London | grep 1970 ...no output... That would be a little confusing. So let's look at it with a pager such as less. Browse and find the years of interest. zdump -v Europe/London | less ... Europe/London Sun Feb 18 01:59:59 1968 UT = Sun Feb 18 01:59:59 1968 GMT isdst=0 gmtoff=0 Europe/London Sun Feb 18 02:00:00 1968 UT = Sun Feb 18 03:00:00 1968 BST isdst=1 gmtoff=3600 Europe/London Sat Oct 26 22:59:59 1968 UT = Sat Oct 26 23:59:59 1968 BST isdst=1 gmtoff=3600 Europe/London Sat Oct 26 23:00:00 1968 UT = Sun Oct 27 00:00:00 1968 BST isdst=0 gmtoff=3600 Europe/London Sun Oct 31 01:59:59 1971 UT = Sun Oct 31 02:59:59 1971 BST isdst=0 gmtoff=3600 Europe/London Sun Oct 31 02:00:00 1971 UT = Sun Oct 31 02:00:00 1971 GMT isdst=0 gmtoff=0 ... And therefore it is of course as Andreas Schwab wrote. "This took place between 27 October 1968 and 31 October 1971, ..." An interesting footnote of history! The date command uses the Time Zone Database for this information. The database is typically updated by the operating system software distribution upon which GNU date is run. The The source of the database is available here. https://www.iana.org/time-zones GNU date is operating upon the data from that database. That database is updated often as it is a global compilation of every act of governance and must be updated as the timezone rules are updated. In the Debian and derivative software distributions I know this is packaged in the 'tzdata' package. > The date command has a number of switches, one of which is -d where you give > it the number of seconds since epoch, as in "date -d@1234" or "date --date > @1234". > > Additionally, you can get it to return as any string you want to, as in "date > -d@1234 "+%c %z %Z" > > Both return "Thu Jan 1 01:20:34 BST 1970" or "Thu Jan 1 01:20:34 +0100 BST > 1970" for the UK. > > /etc/localtime is set to /usr/share/zoneinfo/Europe/London. > > That's wrong, it should give "Thu Jan 1 00:20:34 1970 + GMT". > > After all, in January, the UK is not in daylight saving time at the beginning > of January. And yet there it was! By an Act of Governance daylight saving time was in effect at that time! No one is safe when the government is in session. :-) > It therefore gives you the current daylight saving time status, > rather than what it should be at the time requested. > > I assume currently, this will give erroneous results for any > requests in daylight saving. Because date appears to be operating correctly I am closing this bug ticket. But please we welcome that any discussion may continue in the bug ticket. Bob
bug#40904: listing multiple subdirectories places filenames in different columns between each subdirectory
tag 40904 + notabug close 40904 thanks Jim Clark wrote: > When I list a hard drive "ls -AR > list.txt" and import it into Libreoffice > Calc, then break the lines using "text-to-columns", I am not able to > perform a fixed format break so that the filenames are placed in their own > column. > > It seems like, when listing all subdirectories the largest file size within > the subdirectory places the filename at a column and all the other names in > that subdirectory are at the same column, but other subdirectories will > have their filenames at different columns depending on file size within > that subdirectory. File size? Your example used "ls -AR" which does not include the file size. Therefore I am going to close the ticket for the purpose of accounting. Since there is no bug here. But please let further discussion follow. The ticket can be reopened or reassigned easily if that is determined. > It would be nice if all the filenames were at the same column in the > directory and all subdirectories. If you are trying to use "ls -lAR" then each directory is listed individually and what you are saying is true. However that is the way the GNU ls program is designed to work. Each directory is listed individually with column spacing applied to that directory. As Paul recommended, it is likely better for you to use find instead. Since you apparently want the long listing format then perhaps: find . -ls That will produce a full recursive long listing all of the way down. It will use a wide fixed spacing which is apparently what you want. I am curious. I can't imagine any reason to import a recursive file listing into a spreadsheet... What is the task goal you are trying to do there? Bob
bug#40220: date command set linux epoch time failed
Paul Eggert wrote: > Bob Proulx wrote: > > By reading the documentation for CLOCK_MONOTONIC in clock_gettime(2): > > GNU 'date' doesn't use CLOCK_MONOTONIC, so why is CLOCK_MONOTONIC relevant > to this bug report? GNU date uses clock_settime() and settimeofday() on my Debian system. Let me repeat the strace snippet from my previous message which shows this. TZ=UTC strace -o /tmp/out -v date -s "1970-01-01 00:00:00" ... clock_settime(CLOCK_REALTIME, {tv_sec=0, tv_nsec=0}) = -1 EINVAL (Invalid argument) settimeofday({tv_sec=0, tv_usec=0}, NULL) = -1 EINVAL (Invalid argument) ... Both calls from GNU date are returning EINVAL. Those are Linux kernel system calls. Those Linux kernel system calls are using CLOCK_MONOTONIC. Therefore GNU date, on Linux systems, is by association with the Linux kernel, using CLOCK_MONOTONIC. And the Linux kernel is returning EINVAL. And according to the documentation for both clock_settime() and settimeofday() the most likely reason for the EINVAL is the application of CLOCK_MONOTONIC preventing it, because that documentation says that one cannot set the date earlier than the system uptime. Why this is desirable I have no idea as it does not seem desirable to me. But I am just the messenger, having read that in the documentation looking for the reason for the EINVAL return. > Is this some busybox thing? If so, user 'shy' needs to report it to the > busybox people, not to bug-coreutils. No. It is only a busybox thing as much as it is a GNU date thing in that both are making system calls to the Linux kernel and both are failing with EINVAL. The reference to busybox confused me at first too. But in the original report it was simply another case of the same thing. Which is actually a strong indication that it is not a bug in either of the frontend implementations but something common to both. In this case the kernel is the common part. This does not appear to be a bug in the sense that it is explicit behavior. It is working as the Linux kernel has coded it to behave. According to the documentation. If one were to take this anywhere it would be to the Linux kernel mailing list to discover why they implemented this inconvenient behavior. Meanwhile... Since I am writing this in this thread... I might mention to the original poster that if they are testing using old clock times they might be able to get a good result by using libfaketime https://github.com/wolfcw/libfaketime which is a user land strategy for implementing different fake clock times for programs. Very useful in testing. And then there would be no need to set the system time at all. $ faketime '1970-01-01 00:00:00 UTC' date -uR Thu, 01 Jan 1970 00:00:00 + Bob
bug#40220: date command set linux epoch time failed
Paul Eggert wrote: > Bob Proulx wrote: > > I tested this in a victim system and if I was very quick I was able to > > log in and set the time to :10 seconds but no earlier. > > Sounds like some sort of atomic-time thing, since UTC and TAI differed by 10 > seconds when they started up in 1972. Perhaps the clock in question uses TAI > internally? By reading the documentation for CLOCK_MONOTONIC in clock_gettime(2): CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since--as described by POSIX--"some unspecified point in the past". On Linux, that point corresponds to the number of seconds that the system has been running since it was booted. The CLOCK_MONOTONIC clock is not affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the clock), but is affected by the incremental adjustments performed by adjtime(3) and NTP. This clock does not count time that the system is suspended. It's the, "On Linux, that point corresponds to the number of seconds that the system has been running since it was booted." part that seems to apply here just by the reading of it. To test this I can reboot a VM, which boots quickly, and then as soon as I think it is available by watching the console I can ssh into it as root from another terminal. And then in that other terminal logged in as root I try to execute "date -s '1970-01-01 00:00:00 UTC'" as soon as possible. I am never able to do so due to EINVAL. But if I reboot and repeat the experiment trying to set a few seconds in time later then if I am quick I can sometimes catch "date -s '1970-01-01 00:00:10 UTC'" and have it work. Trying again now I was able to be quick and get logged in and set in :07 UTC. But then if I wait and let seconds tick by and try setting to :10 UTC seconds again it will fail. This matches the model described by the documentation that CLOCK_MONOTONIC is the system uptime and the kernel is not allowing the clock set to be before the system uptime. If I wait longer and try setting the date to various times then experimentally the behavior matches that I cannot set the system time earlier than the the system uptime. Personally I can't see an advantage for this behavior. Because if someone is doing an experiment and wants to reset the clock to time zero then I don't see an advantage of blocking that from happening. However doing so might avoid some accidental settings of the system clock to an unintended zero time. Just like rm --preserve-root. But how often does that actually happen? And then I would want to see a way to do it anyway for the experiment possibilities. Here reading the documentation it seems to be a new hard limitation coded into the Linux kernel that is blocking this. Bob
bug#40220: date command set linux epoch time failed
tag 40220 + notabug close 40220 thanks shy wrote: > I use command date -s "1970-01-20 00:00:00" to set date, but it > failed. there is error message "date: can't set date: Invalid > argument". > It's UTC time and no timezone. This is most likely a limitation of your kernel. I can recreate this problem on a Linux 4.9 system for example. TZ=UTC strace -o /tmp/out -v date -s "1970-01-01 00:00:00" ... clock_settime(CLOCK_REALTIME, {tv_sec=0, tv_nsec=0}) = -1 EINVAL (Invalid argument) settimeofday({tv_sec=0, tv_usec=0}, NULL) = -1 EINVAL (Invalid argument) ... And the documented possible returns of EINVAL for clock_settime(). EINVAL The clk_id specified is not supported on this system. EINVAL (clock_settime()): tp.tv_sec is negative or tp.tv_nsec is outside the range [0..999,999,999]. EINVAL (since Linux 4.3) A call to clock_settime() with a clk_id of CLOCK_REALTIME attempted to set the time to a value less than the current value of the CLOCK_MONOTONIC clock. And for settimeofday(). EINVAL (settimeofday()): timezone is invalid. EINVAL (settimeofday()): tv.tv_sec is negative or tv.tv_usec is outside the range [0..999,999]. EINVAL (since Linux 4.3) (settimeofday()): An attempt was made to set the time to a value less than the current value of the CLOCK_MONOTONIC clock (see clock_gettime(2)). EPERM The calling process has insufficient privilege to call settimeofday(); under Linux the CAP_SYS_TIME capability is required. But this is not a bug in GNU date. This is likely the effect of CLOCK_MONOTONIC in the Linux kernel. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since--as described by POSIX--"some unspecified point in the past". On Linux, that point corresponds to the number of seconds that the system has been running since it was booted. The CLOCK_MONOTONIC clock is not affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the clock), but is affected by the incremental adjustments performed by adjtime(3) and NTP. This clock does not count time that the system is suspended. I am not familiar with CLOCK_MONOTONIC but reading the documentation points me to it as being the most likely reason this is not allowing that time to be set. I tested this in a victim system and if I was very quick I was able to log in and set the time to :10 seconds but no earlier. > I test with stime or settimeofday to set seconds 0, they are all have the > problem. > 1. I use buildroot-2013.05, the busybox is in 1.21.1, the linux kernel is in > version 4.4.39. That multiple frontends, GNU date and busybox date, all have the same problem speaks that the problem is not with the frontend but with the kernel handling the system call. > 3.When set date command, the busybox uses function "stime" to set > time, I use stime to set time around linux epoch time, >but the stime seems not work well. >int ret = 0; >time_t time = 20; >ret = stime(); >printf("ret %d %d\r\n",ret, errno); >perror("stime:"); > and the results are as follows: > ret -1 22 > stime:: Invalid argument And also confirmed by your independent test the the problem is not a bug in GNU date. Therefore I mark this GNU date bug ticket as closed for our own accounting. But please continue to discuss the issue here. Bob
bug#39850: "du" command can not count some files
Hyunho Cho wrote: > $ find /usr/bin -type f | wc -l > 2234 > > $ find /usr/bin -type f -print0 | du -b --files0-from=- | wc -l > Hard links. Files that are hard linked are only counted once by du since du is summing up the disk usage and hard linked files only use disk on the first usage. Add the du -l option if you want to count hard linked files multiple times. find /usr/bin -type f -print0 | du -l -b --files0-from=- | wc -l That will generate an incorrect total disk usage amount however as it will report hard linked disk space for each hard link. But it all depends upon what you are trying to count. > $ du -b $( find /usr/bin -type f ) | wc -l > du -l -b $( find /usr/bin -type f ) | wc -l > $ find /usr/bin -type f -exec stat -c %s {} + | awk '{sum+=$1} END{ print > sum}' > 1296011570 > > $ find /usr/bin -type f -print0 | du -b --files0-from=- | awk '{sum+=$1} END{ > print sum}' > 1282350388 find /usr/bin -type f -print0 | du -l -b --files0-from=- | awk '{sum+=$1} END{ print sum}' > $ diff <( find /usr/bin -type f | sort ) <( find /usr/bin -type f -print0 | > du --files0-from=- | cut -f 2 | sort ) diff <( find /usr/bin -type f | sort ) <( find /usr/bin -type f -print0 | du -l --files0-from=- | cut -f 2 | sort ) I am surprised you didn't try du on each file in addition to stat -c %s on each file when you were summing them up. :-) find /usr/bin -type f -exec du -b {} \; | awk '{sum+=$1} END{ print sum}' Bob
BTS hint about unreachable submitters
Bernhard Voelker wrote: > I had to take out your email address to be able to reply to this issue: > > An error occurred while sending mail. The mail server responded: > Requested action not taken: mailbox unavailable > invalid DNS MX or A/ resource record. There is a useful feature of the BTS that allows one to reply to the BTS and then have the BTS send the message on to other places. Sometimes a bug submitter will have blacklisted all but whitelisted the project site, or other such things. In this case the submitter's address did not have an MX record but did have A records. Traditionally mail is still delivered to sites with only A records although that is frowned upon these days. (Sites wishing to declare that they do not receive mail should use the new-ish "null MX" record these days.) But there are many different possibilities of problems sending to a submitter. This BTS feature allows the project site to send the message to the submitter instead of our own local distributed mail servers. I use this feature frequently. https://www.debian.org/Bugs/Developer#followup nnn-submit...@bugs.debian.org — these are also sent to the submitter and forwarded to debian-bugs-dist, but not to the package maintainer; Therefore if one were to reply to a bug and have a problem with the sender it would be possible to change the To: recipient list -To: u...@example.com, n...@debbugs.gnu.org +To: nnn-submit...@debbugs.gnu.org, n...@debbugs.gnu.org And then @debbugs.gnu.org would deal with both destinations. One to the bug and one to the bug submitter. This should give the best chance of reaching a submitter since the message is then coming from the debbugs.gnu.org domain and host and not our distributed sites from which we reply. Of course doing this means that any delivery problems to the submitter are not visible to the person doing the reply, because errors are handled by the BTS, and handled in this case means discarded. But it does mean that delivery is escalated upstream to the project site, in this case debbugs.gnu.org and if the submitter will not take delivery from there then it eliminates any local distributed configuration policy from the transmission path and clearly indicates a problem on the submitter side. Note that sending to nnn-submitter does not include the bug ticket. (AFAIK. I haven't actually tested this.) The bug ticket must be explicit for it to be included. Usually one would send to both as in the example above. Additionally this supports another rare-ish use case. A long lived bug that exists alive over a long time might have a submitter actually change addresses! In which case nnn-submitter will send to the address it was changed to in the BTS that may have happened since the message being replied to had been sent to the system. https://www.debian.org/Bugs/server-control#submitter At least that is the way I read the documentation on it. I should test this and verify that it changes the "Reported by:" field of a BTS bug. Anyway... Thought this might be useful or at least somewhat interesting... :-) Bob
bug#39135: Globbing with numbers does not allow me to specify order
Antti Savolainen wrote: > When doing a shortcut to unmount in a specific order, I am unable to > specify order with angle brackets. For example using 'umount /dev/sda[132]' > will result in the system unmounting them in numerological order. First 1 > then 2 and finally 3. What I need it to do is to first unmount 1, then 3 > and finally 2. It would be nice for the glob to respect the order of > numbers that it was given. As Bernhard wrote this involves features that have nothing to do with coreutils. However I thought I might say some more too. You say you would like character class expansion of file globbing to preserve the order. But that isn't something that it has ever done before all of the way back 40 years. The [...] brackets give the file glob parser (its called glob because wildcards can match a glob of files) a list of characters to match. These can be ranges such as A-Z or 0-9 and so forth. The collection effectively makes a set of characters. This is expanded by the command line shell. To see the expansion one can use the echo command to echo them out. Try this to see what a command like yours is doing. echo /dev/sda[132] That shows what the umount command line arguments are going to be. The command line shell expands the wildcards and then passes the resulting expansion to the command. The command never sees the wild card itself. Therefore your specific desire is that the command line shell would do something different from what it is doing now. And that would be something different from what it has ever done in the past. This would be a new behavior and a change in historic behavior. And almost certainly one that would break someone who is now depending upon the current behavior of sorting the arguments. They would then file a bug that the arguments were no longer being sorted. And they were there first by decades. Therefore if I were maintaining a shell I would not want to make changes to that ordering since it would certainly break others and generate more bug reports. Instead if you need to have things happen in a specific order then the task is up to you to specify an explicit order. Bernhard suggested brace expansion, which is a GNU bash specific feature. echo /dev/sda{1,3,2} /dev/sda1 /dev/sda3 /dev/sda2 However I am not personally a fan of bash-isms in scripts. They won't work everywhere. Therefore I personally would just explicitly specify the order. umount /dev/sda1 umount /dev/sda3 umount /dev/sda2 Doing things that way is unambiguous. And if that is the correct order then it is the correct order. If you need a command line short cut to make typing this in easier then I personally would create a small shell script. #!/bin/sh # Unmount the devices in mount dependency order. umount /dev/sda1 umount /dev/sda3 umount /dev/sda2 Store this in /usr/local/bin/umount-sda or some such name that makes sense to you and chmod a+x the file to make it executable. Then it is a documentable command to do exactly what is needed. Typical command line completion with TAB will help as a typing aid to expand the file name for you. That is the way I would do it. Bob
bug#38621: gdu showing different sizes
TJ Luoma wrote: > AHA! Ok, now I understand a little better. I have seen the difference > between "size" and "size on disk" and did not realize that applied > here. > > I'm still not 100% clear on _why_ two "identical" files would have > different results for "size on disk" (it _seems_ like those should be > identical) but I suspect that the answer is probably of a technical > nature that would be "over my head" so to speak, and truthfully, all I > really need to know is "sometimes that happens" rather than > understanding the technical details of why. I think at the start is where the confusion began. Because the commands are named to show that they were intended to show different things. 'du' is named for showing disk usage 'ls' is named for listing files And those are rather different things! Let's dig into the details. The long format for information says: ‘-l’ ‘--format=long’ ‘--format=verbose’ In addition to the name of each file, print the file type, file mode bits, number of hard links, owner name, group name, size, and timestamp (*note Formatting file timestamps::), normally the modification timestamp (the mtime, *note File timestamps::). Print question marks for information that cannot be determined. So we know that ls lists the size of the file. But let me specifically say that this is tagged to the *file*. It's file centric. There is also the -s option. ‘-s’ ‘--size’ Print the disk allocation of each file to the left of the file name. This is the amount of disk space used by the file, which is usually a bit more than the file’s size, but it can be less if the file has holes. This displays how much disk space the file consumes instead of the size of the file. The two being different things. And then the 'du' documentation says: ‘du’ reports the amount of disk space used by the set of specified files And so du is the disk used by the file. But as we know the amount of disk used is dependent upon the file system holding the file. Different file systems will have different storage methods and the amount of disk space being consumed by a file will be different and somewhat unrelated to the size of the file. Disk space consumed to hold the file could be larger or smaller than the file size. In particular if the file is sparse then there are "holes" in the middle that are all zero data and do not need to be stored. Thereby saving the space. In which case it will be smaller. Or since files are stored in blocks the final block will have some fragment of space at the end that is past the end of the file but too small to be used for other files. In which case it will be larger. Therefore it is not surprising that the numbers displayed for disk usage is not the same as the file content size. They would really only line up exactly if the file content size is a multiple of the file system storage block size and every block is fully represented on disk. Otherwise they will always be at least somewhat different in number. As long as I am here I should mention 'df' which shows disk free space information. One sometimes thinks that adding up the file content size should add up to du disk usage size, but it doesn't. And one sometimes thinks that adding up all of the du disk usage sizes should add up to the df disk free sizes, but it doesn't. That is due to a similar reason. File systems reserve a min-free amount of space for superuser level processes to ensure continued operation even if the disk is fulling up from non-privileged processes. Also file system efficiency and performance drops dramatically as the file system fills up. Therefore the file system reports space with the min-free reserved space in mind. And once again this is different on different file systems. But let me return to your first bit of information. The ls long listing of the files. Your version of ls gave an indication that something was different about the second file. > % command ls -l *pkg > -rw-r--r-- 1 tjluoma staff 5047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg > -rw-r--r--@ 1 tjluoma staff 5047 Dec 15 00:02 > Stream_Deck_4.4.2.12189.pkg See that '@' in that position? The GNU ls coreutils 8.30 documentation I am looking at says: Following the file mode bits is a single character that specifies whether an alternate access method such as an access control list applies to the file. When the character following the file mode bits is a space, there is no alternate access method. When it is a printing character, then there is such a method. GNU ‘ls’ uses a ‘.’ character to indicate a file with a security context, but no other alternate access method. A file with any other combination of alternate access methods is marked with a ‘+’ character. I did not see anywhere that documented what an '@' means. Therefore it is likely something applied in a downstream
bug#35685: Request
tag 35685 + notabug close 35685 thanks Safdar Iqbal wrote: > Sir,Provide me to installation procedure of wien2k(14.2) on ubuntu > (19.04)sir chmod command cannot excite on my workstation core i7sir > please guide methanks Hello! You are asking about WIEN2k (http://www.wien2k.at/) and also Ubuntu but this is the GNU Coreutils project. We do not know anything about WIEN2k here. As such I can only close the ticket as there isn't anything we can do about it. I am sorry but you will need to contact the WIEN2k people to ask for help about WIEN2k. Good luck! Bob
bug#35654: We've found a vulnerability of gnu chown, please check it and request a cve id for us.
The essence of this report appears to be an attack of the form, can we get the root user to perform an unsafe operation, in this case can we trick root into dereferencing a symbolic link, such as from ./poc to /etc, in order to perform a further action through the symlink. However this is not a bug in chown's -h implementation. Nor is it particular to chown as this could be any other command as the trick to dereference the symlink first before performing whatever action. For example here is a recipe using the same attack but without chown. ln -s /etc /tmp/junk # Now we trick root into reaching through the symlink. # No way root will see this trick coming! rm -f /tmp/junk/* # This removes the files from /etc. The above does not use chown -h but is essentially the same attack. However again this is not a bug in 'rm' nor 'ln'. It is simply trying to trick the superuser into doing unsafe actions. It requires cooperation on the part of root in order to perform the action. But why would the superuser do such silly things? This is very much like Coyote painting a black image on the side of the mountain hoping the Road Runner will mistake it for a tunnel and run into the mountain becoming dinner for Coyote. But the Road Runner never fell for such tricks and neither should the superuser. That it might happen does not make black paint a threat to the Road Runner. The use of 'sudo' does not change the nature of the issue. Only the root user can install sudo and configure it to perform the unsafe actions as you have described. And it also requires a local user to look the superuser in the eye and try to con them up close and personal. Note that this is essentially the same in legacy Unix and in *BSD where symbolic links originated. The community has had decades to poke at them. It is even more interesting to poke at systems that allow environment variables in symbolic links in which case the target is dependent upon the runtime environment variables! The root user is the superuser and with great power comes great responsibility. Extraordinary claims require extraordinary proof. In order for symlinks to be considered as a security vulnerability a more convincing case will need to be presented. Bob
bug#35167: About chroot some question on centos6 kernel:
close 35167 thanks Hello 往事随风, 往事随风 wrote: > OS centos6.10 > kernel vmlinuz-2.6.32-754.el6.x86_64 > hello! > grub-install in a new disk /mnt/boot;copy /bin/bash and *.so ; chroot > /mnt/sysroot is ok!exit and ctrl+d Sounds like 'chroot' worked correctly in the above sequence. > use the new disk startup, > "dracut warning can't mount root filesystemmount :/dev/sda3 already mounted > or /sysroot busy > mount: according to mtab, /dev/sdb3 is already mounted on /mnt/sysroot" > don't chroot /mnt/sysroot > startup ——success > > why?! I don't now! I have no idea either. This does not look like a bug report for the 'chroot' command from the GNU Coreutils project however. It looks like a bug report against 'dracut'. As such there isn't anything that we can do about it here. I think that is why no one else of the team responded. It didn't seem like anything that anyone here could do anything about. Also the chroot command line utility is simply a thin wrapper around the chroot(2) kernel system call. It does whatever the kernel does. Therefore I am going to close the bug in our ticket system. However please do respond and add any further discussion. We will see it. If something looks like a bug the ticket will be re-opened. Bob
bug#34713: Files vanishing when moving to different FS
tags 34713 notabug close 34713 thanks Hello Christoph, Christoph Michelbach wrote: > To reproduce this bug, you need two different file systems. Adapt the > paths to fit your system. Thank you for making this bug report. However what you are experiencing is due to the race condition created by the non-atomic nature of copying files from one file system to another, removing files, and renaming files. This is not a bug in mv but is an intrinsic behavior. > Set the experimental file structure up like this: > > mkdir exp > cd exp > mkdir a > cd a > touch a{1..10} > cd .. > mkdir b > cd b > touch b{1..1} > mkdir /t/ae # /t has to be on a different file system Thank you for the very nice test case. > Then have two terminals open in the exp directory created above. This is a clue to the nature of the problem being a race condition. It describes simultaneous parallel processes. > In one, execute this command: > > mv a /t/ae Because /t is on a different file system mv cannot simply rename the files but must perform the action in two steps. It copies the file from source to destination. It removes source file. This is documented in the mv with: 'mv' can move any type of file from one file system to another. Prior to version '4.0' of the fileutils, 'mv' could move only regular files between file systems. For example, now 'mv' can move an entire directory hierarchy including special device files from one partition to another. It first uses some of the same code that's used by 'cp -a' to copy the requested directories and files, then (assuming the copy succeeded) it removes the originals. If the copy fails, then the part that was copied to the destination partition is removed. If you were to copy three directories from one partition to another and the copy of the first directory succeeded, but the second didn't, the first would be left on the destination partition and the second and third would be left on the original partition. The mv a /t/ae action is similar to cp -a a t/ae && rm -r a when the action is successful. Similar because there are two steps happening. A first step with the copy and a second step with the removal and there is a time skew between those actions. > In the other, execute this one while the one in the first terminal > still is running (hence the large number of files so you have time to > do this): > > mv b/* a This is the second part of the race condition. It it moving files into the a directory at the same time that files are being copied out of the directory and the directory itself being removed. > You will end up with 100 000 files in /t/ae. The 10 000 files beginning > with the letter b will be gone. Look at the two actions explicitly: Process 1: cp -a a /t/ae rm -rf a Process 2: mv b/* a Now it is more obvious that as soon as the first process copy finishes that it will remove the source location, that is having files moved into it by the second process, that the directory will be deleted by the first process. Does that make it easier to understand what is happening? The copy and remove two actions do not occur when both the source and destination are on the same file system. In that case the file can be renamed atomically without doing a copy. But when the action is across two file systems this is not possible and it is simulated (or perhaps emulated) by the copy and remove two step action. Whenever tasks are moving files into and out of the same directory at the same time this is always something to be aware of regardless because they may be an overlap of actions in that directory. In this particular example the problem can be avoided by renaming "a" first and then transfering the files to the other file system. Because it was removed then the second process can create it without collision. Something like this pseudo-code. However to safely use temporary file names will require more code than this. This is simply for illustration purposes. Process 1: mv a tmpdirname cp -a tmpdirname /t/ae rm -rf tmpdirname Process 2: mkdir a mv b/* a I hope this helps. Since this is not a bug in mv I am going to close the ticket in our bug database. But please we would like to hear back from you in this ticket for any further discussion. Bob
bug#34700: rm refuses to remove files owned by the user, even in force mode
Erik Auerswald wrote: > Bob Proulx wrote: > > However regardless of intentions and design if one really wants to > > smash it then this is easily scripted. No code modifications are > > needed. > > > >#!/bin/sh > >chmod -R u+w $1 > >rm -rf $1 > > To everyone considering the above "script": do not use it! It does not even > guard against spaces in file names. Besides being dangerously buggy, it does > not even solve the problem of deleting a file inside a read-only directory. Obviously I typed that in extemporaneously on the spur of the moment. I should have put an "untested" tag upon it. But regardless of that it does not change the fact that the entire purpose of read-only directories is to prevent removing and renaming of files within them. > I would suggest people with specific directories that inhibit deletion of > files inside although they should not (e.g. a "cache") to deliberatly change > the permissions of said directories prior to deleting files inside. Using a > script like the above, even without the basic mistakes in the script, is > quite dangerous. I don't think we are in disagreement here. Bob
bug#34700: rm refuses to remove files owned by the user, even in force mode
Nicolas Mailhot wrote: > For their own reasons, the Go maintainers have decided the user Go cache > will now be read-only. > https://github.com/golang/go/issues/27161#issuecomment-433098406 Not wise. > That means cleaning up cache artefacts with rm does not work anymore > https://github.com/golang/go/issues/30502 Users count upon non-writable directories to prevent files from being deleted. I am confident that changing rm to delete contents of non-writable directories would produce bug reports. And worse it would have resulted in data loss in those cases. Weigh data loss against inconvenience intentionally created. They have intentionally done this to prevent actions such as rm -rf on the path. That is the entire purpose of making directories read-only, to prevent the contents from being removed or renamed. However regardless of intentions and design if one really wants to smash it then this is easily scripted. No code modifications are needed. #!/bin/sh chmod -R u+w $1 rm -rf $1 Bob
bug#12400: rmdir runs "amok", users "curse" GNU...(as rmdir has no option to stay on 1 file system)...
L A Walsh wrote: > Bob Proulx wrote: > > Please provide an example. Something small. Something concrete. > > Please include the version of rmdir. > > The original bug stems from having to use wild cards to delete > all files in a directory instead of '.', as in being told to use: > > rm -fr --one-filesystem foo/* When reporting bugs in command line utilities it is good to avoid using file glob wildcards in the test case. Because that involves the shell. Because that makes the test case dependent upon the contents of the directory which will then be expanded by the shell. > instead of > > rm -fr --one-filesystem foo/. or > cd foo && rm -fr --one-filesystem . rm: refusing to remove '.' or '..' directory: skipping '.' I agree with your complaint about "rm -rf ." not working. That is an annoying nanny-state restriction. It should fail removing '.' after having removed all it can remove. And it only took 16 messages in order to get to this root cause! It would have been so much easier if you had started there. But this report is about rmdir so let's get back to rmdir. Any reports about rm should be in a separate ticket. Mixing multiple bugs in any one bug ticket is confusing and bad. Bob
bug#34524: wc: word count incorrect when words separated only by no-break space
vampyre...@gmail.com wrote: > The man page for wc states: "A word is a... sequence of characters delimited > by white space." > > But its concept of white space only seems to include ASCII white > space. U+00A0 NO-BREAK SPACE, for instance, is not recognized. Indeed this is because wc and other coreutils programs, and other programs, use the libc locale definition. $ printf '\xC2\xA0\n' | env LC_ALL=en_US.UTF-8 od -tx1 -c 000 c2 a0 0a 302 240 \n 003 printf '\xC2\xA0\n' | env LC_ALL=en_US.UTF-8 grep '[[:space:]]' | wc -l 0 $ printf '\xC2\xA0 \n' | env LC_ALL=en_US.UTF-8 grep '[[:space:]]' | wc -l 1 This shows that grep does not recognize \xC2\xA0 as a character in the class of space characters either. $ printf '\xC2\xA0\n' | env LC_ALL=en_US.UTF-8 tr '[[:space:]]' x | od -tx1 -c 000 c2 a0 78 302 240 x 003 And while a space character matches and is translated the other is not. Since character classes are defined as part of the locale table there isn't really anything we can do about it on the coreutils wc side of things. It would need to be redefined upstream there. Bob
bug#34447: `pwd` doesn't show real working directory if directory is renamed by another session
tag 34447 + notabug close 34447 thanks Hello Chris, Chris Wright wrote: > I found that if a session's working directory is renamed or moved, > `pwd` doesn't show the real working directory. Thank you for your bug report. However I think the shell's built-in pwd is being confused with the external pwd command. The shell internal command has the behavior your describe, intentionally. The external one in GNU Coreutils does not. > ~/test $ pwd > /Users//test The above is using the internal shell builtin. $ type pwd pwd is a shell builtin $ type -a pwd pwd is a shell builtin pwd is /bin/pwd The bash shell built-in has this to say about the internal pwd. $ help pwd pwd: pwd [-LP] Print the name of the current working directory. Options: -Lprint the value of $PWD if it names the current working directory -Pprint the physical directory, without any symbolic links By default, `pwd' behaves as if `-L' were specified. Therefore by default the shell's buitin pwd simply prints out the PWD environment variable, which has not changed. This is to preserve the "logical" (not physical) directory tree based upon how the process got there, intentionally tracking how they got there not where they are. They got there by the path stored in PWD. I hate that behavior. But as with most things I was not consulted. :-} In order to do what you want there are at least three options. One is to use the external coreutils version. The idiom for forcing external commands is using 'env' for it. env pwd Another is adding the -P option. This ignores PWD and returns the physical path. pwd -P And the third (what I do) is to set the shell to always use physical paths. Which is how it behaved before they added logical path tracking in the PWD variable. I have this in my ~/.bashrc file. set -o physical Therefore I have closed this bug report for the purpose of triage of the report in the coreutils tracker since this is really about bash and not coreutils. However please do reply as discussion may continue. We would love to continue the discussion. Note that the coreutils 'pwd' defalts to -P, --physical unless -L, --logical is given explicitly. And that the documentation for the coreutils pwd is subtly different from the bash version: '-L' '--logical' If the contents of the environment variable 'PWD' provide an absolute name of the current directory with no '.' or '..' components, but possibly with symbolic links, then output those contents. Otherwise, fall back to default '-P' handling. '-P' '--physical' Print a fully resolved name for the current directory. That is, all components of the printed name will be actual directory names—none will be symbolic links. If '-L' and '-P' are both given, the last one takes precedence. If neither option is given, then this implementation uses '-P' as the default unless the 'POSIXLY_CORRECT' environment variable is set. Due to shell aliases and built-in 'pwd' functions, using an unadorned 'pwd' interactively or in a script may get you different functionality than that described here. Invoke it via 'env' (i.e., 'env pwd ...') to avoid interference from the shell. Hope this helps! Bob
bug#34199: closed (Re: bug#34199: Small bug in cp (for win64))
Chris Kalish wrote: > Hmmm ... not sure of the distribution, but the help file pointed me at this > address: > C:\> cp --version > cp (GNU coreutils) 5.3.0 I always hate it when I am on your side of things and upstream says this to me. But here I am on the other side and going to say almost exactly the thing I hate to hear. Coreutils 5.3.0 was released on 2005-01-08 and today is 2019-02-10 making that version of the program you are running 14 years old! That is a very long time ago. Since you are running on MS-Windows I will say that was probably five whole versions of Microsoft ago! It would not be practically possible for most of us to recreate that version on MS-Windows-XP of that era. This makes it difficult to impossible to do anything about even if we had an alive build system from 2005 still running. Plus here we are concerned about software on free(dom) licensed platforms and Microsoft is a closed source proprietary platform. That was always supported by other teams doing ports to non-free operating systems. What's a developer to do? :-( Perhaps I should ignore all of the years and simply say, yes, that is a bug. (I don't really know. But I will say it. Searching the changelogs will show that 5.3.0 did introduce a number of bugs.) And we have fixed it! The new version is v8.30 and that bug is fixed. Eric reported that it was not a problem for Cygwin on MS-Windows. Please upgrade to it and confirm with us that it is working for you there. Maybe that would be a less unpleasant to hear? :-) > C:\> cp --help > Report bugs to . We are happy to have bugs reported here. But often in ports the behavior is dependent upon the port environment. That is outside of our control. Please do upgrade to a newer version. Cygwin tends to be the most capable version. Although there are other ports too. We would appreciate hearing about how this worked out for you regardless. And maybe along the way you might consider upgrading to a free(dom) software licensed operating system? Then you would have upgrades available by default. :-) Bob
bug#12400: rmdir runs "amok", users "curse" GNU...(as rmdir has no option to stay on 1 file system)...
L A Walsh wrote: > >> If you want a recursive option why not use 'rm -rf'? > > rmdir already provides a recursive delete that can cross > file system boundaries Please provide an example. Something small. Something concrete. Please include the version of rmdir. Something like: mkdir testdir testdir/dir1 testdir/dir2 testdir/dir2/dir3 rmdir --recursive testdir/dir2 rmdir --version Include all input and output verbatim. For clarity do not use shell file glob wildcards because that is a dependence upon a specific command line shell and the shell's configuration. > dir1->dir2->dir3 > > dir1 is on 1 file system, dir 2 is on another and dir 3 can be on another. GNU Coreutils rmdir does not provide a recursive delete option. Therefore one can only assume that the rmdir you are referring to is a different rmdir from a different project. I specifically asked if you were using the rmdir --parents option but my message was the only mention of --parents in this entire ticket and in subsequent responses your messages also did not mention it. Therefore I can only assume that there is no --parents option being used here. > >> There is always 'find' with the -delete option. But regardless there > >> has been the find -exec option. > > true -- so why should 'rm' protect against crossing boundaries > deleting '/' or everything under '.' when there is find? > > find is the obvious solution you are saying, so all that checking in > rm should be removed, as it is inconsistent with rmdir that can > cross boundaries. My mention of 'find' was really a simple statement about alternatives when programmatic needs are desired. Because 'find' is the swiss army chainsaw for directory traversal. I didn't mean to derail the discussion there. But if it is to be derailed then 'find' is the best choice when needing a specific set of programmatic requirements for directory traversal. The other utilities that have simpler capabilities are the distractions. But in theory this bug ticket was about 'rmdir'. > As for closing something not addressed for 6 years while the problem > has grown worse -- (rmdir didnt' used to have a recursive delete), doesn't > seem a great way to judge whether or not a bug is valid or not . GNU Coreutils rmdir does not provide a recursive delete option. This bug report so far has contained conflicting complaints to the point that it has not been useful. It still is not clear if you are complaining about 'rmdir' or 'rm' even after requests for clarification. Or possibly your shell's ** file glob expansion. Probably some combination of them all that is unique to your environment. To be useful a bug report must be descriptive so that the reader can understand it. If the reader can't understand it then how can it be useful? The report must be as simple as possible. Because extraneous complexity is distracting. Stay focused on the bug being reported and not about other unrelated things. Bugs about behavior should be reproducible with a test case. Because nothing is as useful as a concrete example. I have reviewed the reports in this ticket and there seems to be no viable bug report to operate upon here. At some point without a test case it only makes sense to say enough is enough and move on since this does not appear to be a bug in any program of the coreutils project. However even though a bug is closed discussion may continue as we are doing here. The bug state is simply a way to organize reports for the purposes of triage. Many thanks to Assaf for putting in the work to triage these old bug tickets. If you wish to report a bug in rmdir's recursive delete option then we must insist on a test case. Bob
bug#13738: Add --all option to 'users' command
anatoly techtonik wrote: > Bob Proulx wrote: > > > Human users have UIDs starting at 1000, > > > > That assumption is incorrect. Many systems start users off at 100. > > Many others start users at 500. There isn't any univerial standard. > > It is a local system configuration option. > > How to figure out at which number users UIDs start at a given system? That is a system dependent problem. On my Debian Stretch 9 system the /etc/login.defs file contains: # Min/max values for automatic uid selection in useradd # UID_MIN 1000 UID_MAX 6 # System accounts #SYS_UID_MIN 100 #SYS_UID_MAX 999 Other systems will be different. It is a policy implemented by the OS. > > > so you can use that fact to filter out the non-humans: > > > > > > cut -d: -f1,3 /etc/passwd | egrep ':[0-9]{4}$' | cut -d: -f1 > > > > This assumes that /etc/passwd is the user database. While true on a > > typical standalone system it is incorrect when NIS/yp or LDAP or other > > account system is in use. That is why I used 'getent passwd' even > > though it is not available on all systems. When available it obeys > > the local system configuration and returns the correct information. > > If NIS/yp or LDAP are installed, they provide getent, right? 'getent' is actually AFAIK a glibc utility. AFAIK any OS using glibc will provide it. However traditional systems not based on glibc may or may not. I only have limited access to other systems at this time and have no easy way to check *BSD or HP-UX or others for example. > So if there is no getent, then /etc/passwd is de-facto database and > can be reliably used as a fallback. Is that correct? The /etc/nsswitch.conf file determines this. Certainly the lowest level default is /etc/passwd. But the nsswitch.conf file is where modifications are configured for this. > Is there other way to distinguish user accounts other than matching > "things that only seem to be true", like UID numbers? There is no actual difference between user accounts and system accounts. The only real difference is that user accounts have a human user associated with them but system accounts do not. Other than that they are the same. Certainly to the OS they are simply a uid to hold a running process. > > Actually even that isn't sufficient. The value for nobody 65534 is a > > traditional value. But uids may be larger on most modern systems. It > > isn't unusual to have the nobody id in the middle of the range with > > real users having uid numbers both less than and greater than that > > value. Therefore in order to be completely correct additional filter > > methods would be needed such as sets of ranges or block lists or > > something similar. > > Yes. I believe LXD has UID mapping for containers about 10, > and those are not human users in general case. That is a good example. And one of which I was not aware. And I am sure there are other cases too. > I am getting the feeling that the approach of solving problems be using > the tool for specific case is misleading in the case that it battles with > effects and not the causes. The cause of the mess if UID mapping in > Linux kernel, which is not about users at all. There is a concept of user > space, but correct me if I wrong - daemons that run with different UIDs > are run in their own userspace as well. The user concept is not defined > by kernel, but rather by some concept of having home and being able to > login into it either from console or remotely. All processes have a uid. Some uids are associated with a human. Some are not. The kernel doesn't know the difference. The kernel is applying permissions based upon nothing more than the integer number of the process. For example the uid can send a signal to another process with the same uid. Or the superuser process can send a signal to any other process regardless of uid. But a non-superuser process cannot send a signal to another random process of a different uid. None of which has any relation to whether a human can log into the account or not. > If this behavior of humans vs daemons was explicitly documented > somewhere, it could lead to better understanding if solving this problem > in general is real. I don't think this is possible because there really is no difference between system uids and non-system uids. Whether something is a system uid or a non-system uid is a color we paint on it by human judgement and two different people might judge the same thing differently and both would be right. It is also a difference which makes no difference. > > It would help if you could say a few words about the case in > > which this would be helpful? > > Sorry that I've
bug#33943: (omitted) ls directly uses filename as option parameter
tags 33943 notabug close 33943 merge 33942 thanks This message generated a new bug ticket. I merged it with the previous bug ticket. westlake wrote: > I have omitted that I recently downgraded my coreutils to doublecheck > behaviour for ls, and noticed immediately the same behaviour was occuring, It was still occurring because this is not a new behavior of 'ls'. This is the Unix has operated since the beginning. It seems that you missed seeing Assaf's explanation of it. Let me repeat some of it. > $touch 0 ./--a ./-a ./-_a ./-- > $ ls -lad -* [^-]* Here the first example nicely uses ./ in front of the problematic characters. But the second one did not have ./ there. If it did then there would be no problem. But instead the "-*" above is trouble. Don't do it! Always put ./ in front of file globs (wildcards) like that. It should be: $ ls -lad ./-* ./[^-]* > .. however a period of time the behaviour is no longer exhibiting the same, It was not a period of time. It was the contents of the directory upon which the commands were used. It is data dependent. It depends upon the file names that exist. If there are no file names that start with a '-' then none will be mistaken for an option. As you knew when you created the test case using touch above. > I suppose I did not wait long enough for the new "ls" or whatever it is to > come into effect... It is not a time issue. It is only a matter of file glob wildcard expansion as done by the command line shell. Using 'echo' to see a preview of the command will show this. > but there's still oddities with ls, I guess it is the unprediction of > "getopt".. and so I guess I should address any further concerns with the > developers of getopt. This is also not a getopt issue. The best practice is to prefix all wildcards with ./ such as ./*.txt so that the resulting text string will not be confused with an option starting with a '-' even if the file name starts with a '-' as the result will be "./-something" but the resulting argument to ls will start with "./" instead of "-". Bob
bug#33577: ls lacks null terminator option
積丹尼 Dan Jacobson wrote: > For files with blanks in their names, > one shouldn't need this workaround: > $ ls -t | tr \\n \\0 | xargs -0 more > /tmp/z.txt > Please add a --print0 option. like find(1) has. I think that adding a --print0 option to 'ls' is not wise because it would suggest to people seeing it that 'ls' should be used in scripts. But 'ls' is a command designed for human interaction not for use in scripts. Using 'find' for scripted use is the desired utility. Such a patch has previously been submitted. http://lists.gnu.org/archive/html/coreutils/2014-02/msg5.html Bob
Re: Feature request: "mv --relative" or "mv -r"
Daniel Böhmer wrote: > I want to propose a new parameter to the "mv" command which > helps when working with nested directory structures from the > structure's root dir as $PWD. New options require a strong rationale for addition. > Similar to the "ln" command's "--relative" parameter "mv" > could accept this and either chdir to the moved files > location beforehand or (maybe better) calculate the > target location from the given path as relative path > from the file's original location. By my reading of the description it doesn't seem to be similar to ln's --relative option. The proposal above seems to be proposing a way to operate within a subdirectory, which is different. That seems to be simply a change directory first operation. Like make's -C option. > Examples with the "-v" switch for illustration: > > $ mv -rv foo/bar.txt baz.txt > 'foo/bar.txt' -> 'foo/baz.txt' I know this is the trivial command: mv -v foo/bar.txt foo/baz.txt But I know you were wanting to avoid typing in the "foo" part more than once. Therefore this seems like the right short expression for it. (cd foo && mv -v bar.txt baz.txt) > $ mv -rv foo/*.txt subfolder/ > 'foo/bar.txt' -> 'foo/subfolder/bar.txt' > 'foo/baz.txt' -> 'foo/subfolder/baz.txt' Same thing here: (cd foo && mv -v ./*.txt subfolder/) (cd foo && mv -v -t subfolder ./*.txt) [[Note I always use ./* instead of * to avoid any possibility of one of the files starting with a '-' and being interpreted as an option.]] > I often miss this switch for these use cases: > - simply rename a file in a [sub*]directory > - move a bunch of files around in a lower part of the directory structure Additionally one can also easily create a custom script that changes directory first and then performs the operation. For example: #!/bin/sh dir=$1 shift cd $dir && mv "$@" And then call it like: mycdmv very/long/path a.txt b.txt The joy of the Unix philosophy is small utilities that can be joined together to perform larger tasks. > I find the alternatives > - mv very/long/path/{a.txt,b.txt} > - rename 's/a\.txt/b.txt/' very/long/path/a.txt > rather cumbersome to type. I personally find this to be very easy to type and the very long path gets completion making that easy. (cd very/long/path && mv a.txt b.txt) Using (cd directory && ...) is an idiom that just rolls off the fingers very easily. At least for me. Been using it for a very long time. If one typos the directory then the cd will fail and produce an error message appropriately to stderr. If the cd fails then the && fails and the right hand side is not invoked. The entire thing is in a (...) subshell and therefore the directory change evaporates after that process exits. > For the source location > I want to use bash completion and for the target > location I'd like to type only the new information, > i.e. the subfolder or the new filename. The cd can make use of bash completion. The rest seems not to be automatically completable anyway. > Do you know of any easier existing ways to achieve > the desired behavior? I personally would use the above techique of changing directory in a subshell before running the command. It is also portable across systems. But there is also the -t --target-directory option too. It can be very useful depending upon the situation too. Bob
Re: FAQ confusing terminology regarding GNU and Linux Relationship
fdvwc4+ekdk64wrie5d8rnqd9...@guerrillamail.com wrote: > Under the section in the FAQ about uname, it refers to ``the Linux > kernel." Is not the GNU position that Linux should be referred to as > ``Linux, the kernel' or something similar? Thank you for asking! It gives us a chance to talk. :-) An exact phrasing is not required. It is only important that the different parts be correctly identified. In English the phrasing of, "Due to the peculiarities of the Linux kernel ..." reads naturally and we have identified the kernel that we are talking about. If one were to say, "Due to the peculiarities of Linux, the kernel, ..." it would not be a natural phrasing order and would imply that Linux has multiple parts, of which one part is the kernel part, but there are also other parts. That is not what is being intended to be said. The implication of that phrasing might be, "Linux, the editor, ..." or some such when there is no Linux editor that I am aware. Therefore we use the natural order most of the time. This is just the same as when we say things like "the Emacs editor" or "the Vim editor" or other things of which there are many editors and we wish to identify one of them. The important point there is to properly indicate what is providing the uname(3) system call information. In that case it is talking about the Linux kernel uniquely. Which is distinct from a BSD kernel, or an HP-UX kernel, or an AIX kernel, or a Solaris kernel, or any of the other kernels that also exist and often run GNU Project software such as GNU coreutils. There is also a Debian implementation running a FreeBSD kernel with a GNU userland running GNU coreutils! Each kernel provides different information that they have chosen. The GNU uname(1) command is simply displaying it. Please refer to the GNU/Linux FAQ by Richard Stallman for more details concerning use, and the reasoning behind it, of these terms. https://www.gnu.org/gnu/gnu-linux-faq.html Bob
Re: Broken link on website
Mihir Mehta wrote: > I thought you might want to fix a broken link (to > http://stagecraft.theprices.net/nomime.html) that appears on the main > Coreutils website (https://www.gnu.org/software/coreutils/coreutils.html). Thank you for you reporting that problem. It's appreciated. We didn't know it before this. As a simple and quick fix I have updated the link to point to the Wayback Machine's copy at archive.org. It will take a few minutes for the update to propagate but then it will be visible on that page again. Thanks again for reporting that problem! :-) Bob
Re: chroot add option to mount /dev /proc /sys for you?
Marc Weber wrote: > So the question would turn into would it make sense to create a new tool > which (optionally cleans up) like this: > > with-mounts sys,proc,dev -- chroot ... > > There might be many use cases. > > I think there is interest. But I'm unsure where would be the place to > put such script. I hit this problem multiple times. Are you aware of 'schroot'? It already does everything I think you are wanting to do. https://gitlab.com/codelibre/schroot https://packages.debian.org/sid/schroot It's very fancy. It mostly replaces the older 'dchroot' which is yet another utility in this topic space. https://packages.debian.org/sid/dchroot Bob signature.asc Description: PGP signature
Re: FAQ rm backup section outdated
Garreau, Alexandre wrote: > Yet that FAQ section seems to be somewhat outdated: It became outdated when ext3 became popular. Because journaling file systems zero out the data. Originally I had said rather strongly that when deleted the files were gone. But then there were many people who claimed that was incorrect and rebuked me for writing it. Therefore it was softened substantially. And I didn't have the heart to just say that once the file was removed that it was gone. Who I am to take away all hope when there were some cases when it was possible to recover the file? Especially if someone was perhaps looking on a FAT file system on an SD card or something. > at some point it suggests recover is packaged and right away > installable under Debian while it’s not packaged anymore since > Wheezy [1] (currently oldoldstable), moreover it seems its webpage > [2], that is the second reference of the section, is a dead link. Alas it seems there is only the archive.org version now. https://web.archive.org/web/20080725005349/http://recover.sourceforge.net/linux/recover/ > The third [3], too, except I couldn't find it neither in my system at > the aforementioned path [4], without forgetting to note that it suggests > reading it with less, while with it being gzipped it should be read with > zless. That depends upon your system. On my system less defaults to knowing how to read gzip'd files without needing zless for it. I didn't realize that other systems did not provide that by default. In any case if you get that far then you should be able to figure it out from there. If not then being able to recover a deleted file is surely impossible. > I managed with apt-file to find its translations are still > packaged in the packages doc-linux-fr-text and doc-linux-pl, but > couldn't manage to find their original version, which is sad since from > what I read in the french version, it seems to be really well written > and maybe even quite useful for other filesystems than ext2. Once again archive.org is an invaluable resource. https://web.archive.org/web/20051220103138/http://tldp.org:80/HOWTO/Ext2fs-Undeletion.html However that is specific to ext2. If you are not using an ext2 file system then it will be of little help for you. > The fivth [5] is dead too, while since then it seems to have been > packaged for Debian, even Wheezy nodaways, through backports, and at A copy of the reference page is: https://web.archive.org/web/20131105155135/http://developer.berlios.de/projects/ext4magic But of course the code release is not available from there. > least it notices it through a english sentence I personnally can't parse > correctly near the end: “This is also packaged for some software > distributions such as Debian probably others too.” (“This is also > packaged for some software distributions such as Debian *and* probably > others too.” appears less confusing to me). Noted. A good improvement. > The last [6] is also dead, probably due to the move of Gentoo wiki to a > subdomain of its [7], yet seemingly without the page in question [8]. https://web.archive.org/web/2008100713/http://www.gentoo-wiki.com/HOWTO_Move_Files_to_a_Trash I will work through this entry and give it a refresh. Given the current state of the world what would you say there? > Everything of this appears quite frustrating to me, as I always only > knew testdisk and photorec (which I often heard cited yet I didn't see > mentioned there), and seeing this link I hoped to easily find some > simpler (cli instead of tui for instance) or easier (higher-level) > utility. First I can only say sorry for your loss. Because I know if you had not lost something you wanted back that you would not have gone to the trouble of this research and then writing here. However depending upon the system if a file has been removed then it is likely lost without a backup of it. What file system type held the files now lost? To be recoverable it would need to be a file system that does not overwrite the data when the file is removed. Bob
bug#22195: deviation from POSIX in tee
Pádraig Brady wrote: > Paul Eggert wrote: > > trap '' PIPE > > Generally you don't want to ignore SIGPIPE. > http://pixelbeat/programming/sigpipe_handling.html > as then you have to deal with EPIPE from write(): I wanted to add emphasis to this. Ignoring SIGPIPE causes a cascade of associated problems. Best not to do it. Bob P.S. Typo alert: http://pixelbeat/programming/sigpipe_handling.html Should be: http://www.pixelbeat.org/programming/sigpipe_handling.html
bug#22128: dirname enhancement
Nellis, Kenneth wrote: > Still, my -f suggestion would be easier to type, > but I welcome your alternatives. Here is the problem. You would like dirname to read a list from a file. Someone else will want it to read a file list of files listing files. Another will want to skip one header line. Another will want to skip multiple header lines. Another will want the exact same feature in basename too. Another will want file name modification so that it can be used to rename directories. And on and on and on. Trying to put every possible combination of feature into every utility leads to unmanageable code bloat. What do all of those have in common? They are all specific features that are easily available by using the features of the operating system. That is the entire point of a Unix-like operating system. It already has all of the tools needed. You tell it what you want it to do using those features. That is the way the operating system is designed. Utilities such as dirname are simply small pieces in the complete solution. In this instance the first thing I thought of when I read your dirname -f request was a loop. while read dir; do dirname $dir; done < list Pádraig suggested xargs which was even shorter. xargs dirname < filename Both of those directly do exactly what you had asked to do. The technique works not only with dirname but with every other command on the system too. A technique that works with everything is much better than something that only works in one small place. Want to get the basename instead? while read dir; do basename $dir; done < list Want to modify the result to add a suffix? while read dir; do echo $dir.myaddedsuffix; done < list Want to modify the name in some custom way? while read dir; do echo $dir | sed 's/foo/bar/; done < list Want a sorted unique list modified in some custom way? while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u The possibilities are endless and as they say limited only by your imagination. Anything you can think of doing you can tell the system to do it for you. Truly a marvelous thing to be so empowered. Note that in order to be completely general and work with arbitrary names that have embedded newlines then proper quoting is required and the wisdom of today says always use null terminated strings. But if you are using a file of names then I assume you are operating on a restricted and sane set of characters so this won't matter to you. I do that all of the time. Bob
bug#22128: dirname enhancement
Pádraig Brady wrote: > Nellis, Kenneth wrote: > > E.g., to get a list of directories that contain a specific file: > > > > find -name "xyz.dat" | dirname -f - > > find -name "xyz.dat" -print0 | xargs -r0 dirname Also if using GNU find can use GNU find's -printf operand and %h to print the directory of the matching item. Not portable to non-gnu systems. find . -name xyz.dat -printf "%h\n" Can generate null terminated string output for further xargs -0 use. find . -name xyz.dat -printf "%h\0" | xargs -0 ...otherstuff... Bob
bug#22087: Problem with stdbuf configure test for 8.24 on Solaris with Studio C compiler.
Paul Eggert wrote: > How about the attached (untested) patch instead? It should fix the > underlying problem, and thus avoid the need for fiddling with compiler > flags. > diff --git a/configure.ac b/configure.ac > index 66c8cbe..3f546e9 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -475,7 +475,8 @@ AC_LINK_IFELSE( > { >stdbuf = 1; > }]],[[ > -return !(stdbuf == 1);]]) > +if (stdbuf != 1) > + return 1;]]) >], >[stdbuf_supported=yes]) > AC_MSG_RESULT([$stdbuf_supported]) Fallthrough return 0? Or is a return 0 already defaulted? It stood out to me that the previous return was unconditional and without an else or a fallthrough this is a change from the previous control flow. -return !(stdbuf == 1);]]) +if (stdbuf != 1) + return 1; +return 0;]]) ?? Bob
bug#22087: Problem with stdbuf configure test for 8.24 on Solaris with Studio C compiler.
Eric Blake wrote: > Bob Proulx wrote: > > Or is a return 0 already defaulted? It stood out to me that the > > previous return was unconditional and without an else or a > > fallthrough this is a change from the previous control flow. > > > > -return !(stdbuf == 1);]]) > > +if (stdbuf != 1) > > + return 1; > > +return 0;]]) > > Explicitly listing 'return 0;' here would result in a doubled-up return > 0 in the overall conftest.c file. Gotcha! That there is already a default return 0 answers my question. Thanks, Bob
bug#22001: Is it possible to tab separate concatenated files?
Macdonald, Kim - BCCDC wrote: > Sorry for the confusion - I wanted to add a tab (or even a new line) > after each file that was concatenated. Actually a new line may be > better. > > For Example: > Concatenate the files like so: > >gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole > >genome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT > >gi|452742846|ref|NZ_CAFD01002.1| Salmonella enterica subsp., whole > >genome shotgun > >sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC > >gi|452742846|ref|NZ_CAFD01003.1| Salmonella enterica subsp., whole > >genome shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG > > Right now - Just using cat, they look , like: > >gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole > >genome shotgun > >sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT>gi|452742846|ref|NZ_CAFD01002.1| > > Salmonella enterica subsp., whole genome shotgun > >sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC>gi|452742846|ref|NZ_CAFD01003.1| > > Salmonella enterica subsp., whole genome shotgun > >sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG That example shows a completely different problem. It shows that your input plain text files have no terminating newline, making them officially not plain text files but binary files. Because every plain text line in a file must be terminated with a newline. If they are not then it isn't a text line. Must be binary. Why isn't there a newline at the end of the file? Fix that and all of your problems and many others go away. Getting ahead of things 1... If you just can't fix the lack of a newline at the end of those files then you must handle it explicitly. for f in *.txt; do cat "$f" echo done Getting ahead of things 2... Sometimes people just want a separator between files. Actually 'tail' will already do this rather well. tail -n+0 *.txt ==> 1.txt <== foo ==> 2.txt <== bar Bob
Re: feature request: tail -H
Pádraig Brady wrote: > Upon more careful consideration, I'm 50:50 > about adding per line processing to tail. >... > Perhaps we could just add the above snippet to the docs? > The big advantage is that it works everywhere already. Perhaps simply add a reference to 'multitail' being available? It is still my preferred tool for that type of thing. Bob
Re: Enhancement request for tee - please add the option to not quit on SIGPIPE when someother files are still opened
Bernhard Voelker wrote: > I'm not convinced that a new --no-stdout option is warranted: > why not simply redirect stdout to the last fifo? > > cat /dev/zero | head -c500M \ > | (/dev/shm/AAA/coreutils-8.24/src/tee -p \ > $d/fifo1 $d/fifo2 $d/fifo3 > $d/fifo4 ) 2>&1 \ > | > tee $d/run.log & Of course! It was so obvious that we missed seeing it! Simply do a normal redirect of stdout to the process. Thanks Bernhard for pointing this out. This is also true of the >(process substitutions) too. echo foo | tee >(sleep 2 && cat) > >(sleep 5 && cat) This really argues against any need for --no-stdout. Because if one wants --no-stdout it means one has forgotten about a normal redirection. Bob
Re: Enhancement request for tee - please add the option to not quit on SIGPIPE when someother files are still opened
Pádraig Brady wrote: > Jirka Hladky wrote: > > => it's almost there expect that it runs forever because of >/dev/null I am going to suggest this without trying it, always dangerous, but I have no time for a deep investment. Sorry. What about closing stdout? Then it would be closed right from the start. 1>&- Except that the no-exit behavior is only on pipes. So I guess it would need to have a pipe created just to have it closed. | cat 1>&- > Right, the particular issue here is that the >(process substitutions) > are writing to stdout, and this is intermingled through the pipe > to what tee is writing to stdout. The general problem I have with >(process substitutions) are that they are completely asynchronous. There is no way to tell if they are done. rwp@fencepost:~$ echo foo | tee >(sleep 5 && cat) foo rwp@fencepost:~$ sleep 3 && echo sleep 3 done sleep 3 done rwp@fencepost:~$ foo I complained about that on the bash list a couple of years ago. There appears to be no way to synchronize those proceses back together again. No way to join the forked flow. For my sensibilities that makes the utility much reduced to only those things that you don't care when the task finishes. In the above that later output may appear at any time. It is likely to appear in sequence but there is no guarantee of it. What if the task is waiting on the network such as doing a dns lookup? What if the machine is heavily loaded and the process is delayed? Then the task may be delayed indefinitely and may finish at any arbitrary time. > So in summary, maybe there is the need for --no-stdout, > though I don't see it yet myself TBH. I haven't come across a use for it in my own programs. There doesn't seem to be significant requests for it. But I am not opposed to it. At this time I don't see any fundamental problems in the tee side of things. The only problem is that people using the shell >(...) side of things have the problem of no way to wait for the children to finish and no way to join the forked flow. That in the shell is a serious problem as far as I am concerned. I wouldn't use it until the shell side gets more fully featured out. Bob
Re: Enhancement request for tee - please add the option to not quit on SIGPIPE when someother files are still opened
Pádraig Brady wrote: > Already done in the previous v8.24 release: Bob Proulx wrote: > If you ignore SIGPIPE in tee in the above then what will terminate the > tee process? Since the input is not ever terminated. http://www.gnu.org/software/coreutils/manual/html_node/tee-invocation.html#tee-invocation ‘-p’ ‘--output-error[=mode]’ Select the behavior with errors on the outputs, where mode is one of the following: ... ‘warn-nopipe’ Warn on error opening or writing any output, except pipes. Writing is continued to still open files/pipes. Exit status indicates failure if any non pipe output had an error. This is the default mode when not specified. Ah... I see. Bob
Re: Enhancement request for tee - please add the option to not quit on SIGPIPE when someother files are still opened
Jirka Hladky wrote: > I have recently run into an issue that tee will finish as soon as first > pipe it's writing to is closed. Please consider this example: > > $cat /dev/zero | tee >(head -c1 | wc -c ) >(head -c100M | wc -c ) >/dev/null > 1 > 65536 > > Second wc command will receive only 64kB instead of expected 100MB. Expectations depend upon the beholder of the expectation. :-) > IMHO, tee should have a command line option to proceed as long some file is > opened. > > cat /dev/zero | mytee --skip_stdout_output --continue_on_sigpipe >(head -c1 > | wc -c ) >(head -c100M | wc -c ) If you ignore SIGPIPE in tee in the above then what will terminate the tee process? Since the input is not ever terminated. Also, a Useless-Use-Of-Cat in the above too. > It should be accompanied by another switch which will suppress > writing to STDOUT. Isn't >/dev/null already a sufficient switch to supress stdout? Bob
Re: Enhancement Requests + Patch: Unlink to support multiple arguments
Adam Brenner wrote: > I have an enhancement requests and patch ready for unlink to support > multiple arguments. Currently unlink does not accept more than one > argument to remove a link. I have a patch that adds that > functionality. For example: > > $ unlink link1 link2 link3 Why are you using unlink here instead of rm? I think you are using the wrong tool for the job. Why even care if unlink can take multiple arguments? It used to be that unlink existed for tasks such as allowing the superuser to unlink the ".." entry from a directory. Which was one additional layer in a chroot to prevent crawling out of it. Which was why the older documentation mentions needing sufficient privileges. This unlinking of ".." isn't allowed anymore on recent file systems and other container practices have appeared making this practice obsolete. > Is this something the community would like for me to submit? I do not have a strong opinion. Traditional Unix systems only allowed one argument and no options. *BSD now allows multiple arguments and includes various options. It would be nice to be compatible with *BSD. I think the main reason why it doesn't matter that unlink can have multiple arguments or not is that one should be using rm for this task instead. If one uses unlink and multiple arguments then they are using the wrong tool. Bob
Re: Enhancement Requests + Patch: Unlink to support multiple arguments
Pádraig Brady wrote: > Bob Proulx wrote: > > Why are you using unlink here instead of rm? I think you are using > > the wrong tool for the job. Why even care if unlink can take multiple > > arguments? >... > Good info thanks. Yes rm will not unlink() a dir. Is the use of unlink so that it works on either files or directories? Seemingly an unusual case. And what if the directory is not empty? And also why not use rmdir on directories? > >> Is this something the community would like for me to submit? > > > > I do not have a strong opinion. Traditional Unix systems only allowed > > one argument and no options. *BSD now allows multiple arguments and > > includes various options. It would be nice to be compatible with *BSD. > > I'm not sure they do? Oops. Yes. I see I was wrong there. > Note the rm and unlink man pages are merged in FreeBSD at least, > which might confuse things. Oh, yes, that does tend to put people on that track. I wonder why they merge those two but don't merge rmdir? I wouldn't merge any of those and would keep all three separate. Frankly I would deprecate and hide unlink the command as much as possible. If (and I wouldn't but if) I were to merge two man pages I would merge rm and rmdir and not rm and unlink. Looking at the man page now I see: SYNOPSIS rm [-f | -i] [-dIPRrvWx] file ... unlink file > > one argument and no options. *BSD now allows multiple arguments and > > includes various options. It would be nice to be compatible with *BSD. I was wrong. Drat! *BSD does NOT allow multiple arguments does not not allow options. As the man page synopsis clearly shows. So actually GNU unlink appears perfectly compatible with *BSD as well as the traditional Unix systems such as HP-UX and so forth. I had that statement wrong. Oops. I guess with *BSD having the same unlink behavior as GNU unlink I would lean to do nothing so as to remain compatible. And I would encourage people not to use the unlink command further. Nothing prevents people from using the unlink system call however. For example using perl: perl -e 'unlink "foo"' Bob
Re: mkdir and ls display
Pádraig Brady wrote: > Ngô Huy wrote: > > > I had problem with mkdir and ls when I used command: > > > > > > mkdir "* /" && mkdir "* /etc" && ls. > > > > > > It only displayed *. > > > > Note as yet unreleased version of ls will use shell quoting > > to give a less ambiguous output: > > > > $ ls > > '* ' With previous and legacy versions you can use -b to quote non-graphic characters. ‘-b’ ‘--escape’ ‘--quoting-style=escape’ Quote nongraphic characters in file names using alphabetic and octal backslash sequences like those used in C. That would show: $ ls -b *\ > > > If we have hidden directory and use xargs with find to execute some > > command, it's security risk. Should we patch it's behavior ? > > > > I think you're worried about the '*' being expanded? > > Or maybe the xargs splitting on the space. File names will either be internally or externally produced and if externally produced then they should be treated with the same care of any external data. It is always necessary to properly protect any external input. If the file names are protected correctly then there is no need to worry about shell metacharacter expansion and word splitting. > > Not something that mkdir (coreutils) should be worried > > about in any case. Agreed. > > I see this, but when use mkdir "* /" && mkdir "* /etc", it > > shouldn't be / in file name, right ? > > I don't see the issue here. mkdir is just passing down to the mkdir syscall > to create the "* /etc" dir, i.e. the 'etc' dir in the already created '* ' > dir. I think maybe the concern is the '/' *in* the filename? There are only two characters that cannot occur in a file name. That is '/' and the zero character '\0'. All others are allowed. Since the '/' is not allowed it can never occur in the filename. What happens instead (and I am a little fuzzy with this description) is that the '/' forces the file system to treat it as a directory and if it is a symlink it will dereference through the symlink to the underlying directory. Therefore these are equivalent. mkdir foo mkdir foo/ mkdir foo// mkdir foo/// mkdir foo ... > > We try to avoid incident problem, I think we should limit file > > name's character. Only '/' and '\0' are not allowed. All other characters are allowed. > That would have to be done at the kernel level. Agreed. > There have been proposals with POSIX to use a restricted character > set for file names. I often think it would have been nice if whitespace was disallowed. That would make the traditional shell handling of file names much easier and no worries about whitespace in file names. Oh well. Bob
bug#21908: find -f breaks pipes ?
Flemming Gulager Danielsen wrote: > I am new here so if this is not the proper use of the maillist then > I am sorry. Others answered your output buffering question. Let me address the mailing list question. When you send a message to bug-coreutils it opens a bug ticket so that we can track bugs and they won't get lost. When asking a question such as you are doing it is better to use the coreutils@gnu.org mailing list instead. That is a normal mailing list just for discussion and no bug tickets opened. It is the better place for questions and discussion. Bob
bug#21916: sort -u drops unique lines with some locales
Pádraig Brady wrote: > Christoph Anton Mitterer wrote: > > Attached is a file, that, when sort -u'ed in my locale, looses lines > > which are however unique. > > > > I've also attached the locale, since it's a custom made one, but the > > same seem to happen with "standard" locales as well, see e.g. > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=695489 > > > > PS: Please keep me CCed, as I'm writing off list. > > If you compare at the byte level you'll get appropriate grouping: > > $ printf '%s\n' Ⅱ Ⅰ | LC_ALL=C sort > Ⅰ > Ⅱ It is also possible to set only LC_COLLATE=C and not set everything to C. > The same goes for other similar representations, > like full width forms of latin numbers: > > $ printf '%s\n' 2 1 | ltrace -e strcoll sort > sort->strcoll("\357\274\222", "\357\274\221") = 0 > 2 > 1 > > That's a bit surprising, though maybe since only a limited > number of these representations are provided, it was > not thought appropriate to provide collation orders for them. Hmm... Seems questionable to me. > There are details on the unicode representation at: > https://en.wikipedia.org/wiki/Numerals_in_Unicode#Roman_numerals_in_Unicode > Where it says "[f]or most purposes, it is preferable to compose the Roman > numerals > from sequences of the appropriate Latin letters" > > For example you could mix ISO 8859-1 and ISO 8859-5 to get appropriate > sorting: One can transliterate them using 'iconv'. printf '%s\n' Ⅱ Ⅰ 2 1 | iconv -f UTF-8 -t ASCII//TRANSLIT | sort 1 2 I II Bob