Re: Change a file name - remove a consistent string recursively
On Tue, Jan 17, 2023 at 12:01:35AM +1100, Les Kitchen wrote: > On Mon, Jan 16, 2023, at 21:42, Craig Sanders via luv-main wrote: > > On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote: > >> I'd do something like: > >> > >> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") > >> if $_ ne $o;' > > Thanks, Craig, for your followup. > > > This is quite dangerous, for several reasons. To start with, there's no > > protection against renaming files over existing files with the same target > > name. > ... > > Well, that's the intention of the -i (interactive) option to mv, > to require user agreement before over-writing existing files. Which gets tedious real fast with more than a few files to confirm. > All the other points you raise are valid, especially the dangers of feeding > unchecked input into the shell, and anybody writing shell code needs to be > aware of them — although I will say I mentioned pretty much all of them in > the notes further down in my message, though in less detail than you have, > and without stressing enough the dangers. Yeah, i noticed that - I just thought it needed to be emphasised and explained in more detail. These issues are the source of a lot of really serious bugs in shell scripts & one-liners. > And, yes, if you have filenames with arbitrary characters, then you have to > resort to other contrivances, ultimately to NULL-terminated strings. Using NUL as the separator isn't a "contrivance". It's standard good practice - use a delimiter that ISN'T (and preferably CAN'T be) in your input. Since NUL is the only character that can't be in a path/filename, that's the only character to use. It works whether you've got annoying characters in the filenames or not. No special cases or special handling required. It just works. > And, yes, if you have a huge number of files, then you'd likely want to do > the rename internal to your scripting language, instead of forking a new > process for each file rename. But then you lose the easy ability to review > the commands before they're executed. It's not difficult to change a print() statement to a rename() statement, or to have both and comment out the rename until you've verified the output (i.e. a simple "dry-run"). > And I could also mention the potential for filenames to contain UTF-8 > (or other encodings) for characters that just happen to look like ASCII > characters, but aren't, or to contain terminal-control escape sequences. It > can get very weird. While there's a handful of problematic unicode characters (mostly the extra whitespace characters), in general unicode is not a problem. Especially if you use NUL and/or proper quoting and/or arrays (e.g. `find` in combination with the bash/ksh/zsh builtin mapfile/readarray and process substitution is extremely useful - mapfile also supports NUL as the delimiter, another great method of eliminating whitespace & quoting bugs). > In general, there's a big big difference between a simple shell one-liner > that you use as a work amplifier in situations you know are well-behaved, > and a piece of robust code that can behave gracefully no matter what weird > input is thrown at it. They're different use-cases. It's not hard to write robust one-liners. It just takes practice - a matter of developing good habits and stomping on bad habits until it's automatic. And using tools like shellcheck to highlight common mistakes and bad practices helps a lot - it's available as a command-line tool and as a paste-your-code-here web service. https://www.shellcheck.net/ It's packaged for debian and probably most other distros and is, IMO, essential for any shell user, even if (especially if!) you're just dabbling with the simplest of shell scripts or one-liners. I wish it had been around when I was learning shell - I look at some of the shell code I wrote years ago and just shudder at how awful it is. I got better with practice, though :) I made a lot of those mistakes because I simply didn't know they were mistakes, didn't know how dangerous they were, didn't know any better at the time. shellcheck solves that problem. Package: shellcheck Description-en: lint tool for shell scripts The goals of ShellCheck are: . * To point out and clarify typical beginner's syntax issues, that causes a shell to give cryptic error messages. . * To point out and clarify typical intermediate level semantic problems, that causes a shell to behave strangely and counter-intuitively. . * To point out subtle caveats, corner cases and pitfalls, that may cause an advanced user's otherwise working script to fail under future circumstances. Hastily written one-liners often lead to questions like "WTF happened to my data?", "How can I reverse this 'sed -i' command I just ran?", and "Is it possible to undelete files on ext4?" > > Worse, it will break if any filenames contain whitespace characters > > (newlines, > > tabs, spaces, etc - all of which are
Re: Change a file name - remove a consistent string recursively
On Mon, Jan 16, 2023, at 21:42, Craig Sanders via luv-main wrote: > On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote: >> I'd do something like: >> >> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") >> if $_ ne $o;' Thanks, Craig, for your followup. > This is quite dangerous, for several reasons. To start with, there's no > protection against renaming files over existing files with the same target > name. ... Well, that's the intention of the -i (interactive) option to mv, to require user agreement before over-writing existing files. All the other points you raise are valid, especially the dangers of feeding unchecked input into the shell, and anybody writing shell code needs to be aware of them — although I will say I mentioned pretty much all of them in the notes further down in my message, though in less detail than you have, and without stressing enough the dangers. If you know you've got a modest number of files, with well-behaved file names (in the Unix-shell sense, that is, no whitespace, no shell metacharacters, etc.), no changes to the directory structure, then the approach I suggested can work quite well. And one big advantage is that you can review the generated list of shell commands, and check that they'll do what you expect before committing to executing them. And, yes, if you have filenames with arbitrary characters, then you have to resort to other contrivances, ultimately to NULL-terminated strings. And, yes, if you have a huge number of files, then you'd likely want to do the rename internal to your scripting language, instead of forking a new process for each file rename. But then you lose the easy ability to review the commands before they're executed. Of course there are workarounds. You could define, say, a Perl routine that in one definition just prints the names in the file renaming, and in another definition, actually does the rename But then you're getting well beyond the simple one-liner (unless you use that Perl rename program you mention later). And I could also mention the potential for filenames to contain UTF-8 (or other encodings) for characters that just happen to look like ASCII characters, but aren't, or to contain terminal-control escape sequences. It can get very weird. In general, there's a big big difference between a simple shell one-liner that you use as a work amplifier in situations you know are well-behaved, and a piece of robust code that can behave gracefully no matter what weird input is thrown at it. They're different use-cases. > It also doesn't distinguish between .junk. in a directory name vs in a file > name - it will just modify the first instance of ".junk." it sees in the > pathname. e.g. "./Dir1/My.junk.dir/my.junk.file.txt". Probably not a problem > in practice, but something to be aware of. Yeah. > Worse, it will break if any filenames contain whitespace characters (newlines, > tabs, spaces, etc - all of which are completely valid in filenames - the ONLY > characters guaranteed NOT to be in a pathname are / and NUL). This should be taped to the screen of every shell user. Actually, files with spaces are pretty common in non-Unix environments, like Windows or MacOS (yes, I know it's Unix underneath), but they're pretty simple to handle by double-quoting, as I mentioned in my notes — and that will handle pretty much everything except for characters that interpolate into double-quoted strings, I guess $ ` (backtick), and possibly !. > And because you're NOT quoting the filenames in your print statement, it > will also break if any filenames contains shell metacharacters like ; & > < > etc when the output is piped into sh. A simple fix might appear to be to use > single-quotes in the print statement - e.g. print("mv -i '$o' '$_'") - but > even this will break if a filename contains a single-quote character. Similar > for escaped double-quotes. It's even messier than this. Because you're already using single quotes for the -e expression to Perl, you can't immediately use them like that. You have to do something like '"'"' to close the single-quoted string, then attach a double-quoted single quote, then open a new single-quoted string. I don't even want to think about it. By then you might as well shift to using NULL-terminated strings. > Shell can be very dangerous if you don't quote your arguments properly. > Consider, for example, what would happen if there happened to be a file called > ";rm --no-preserve-root -rf /;" (or ";sudo rm ;") under /Dir1. That's > a fairly extreme example of an obviously malicious filename, but there are > plenty of legitimate, seemingly innocuous filenames that WILL cause problems > if passed unquoted to the shell. > > Whitespace and quoting issues in shell are well-known and long-standing, > and pretty much inherent to the way the shell parses its command line - the > subject of many FAQs and security advisories. > > It's unfortunately very easy to
Re: Change a file name - remove a consistent string recursively
On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote: > I'd do something like: > > find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") if > $_ ne $o;' This is quite dangerous, for several reasons. To start with, there's no protection against renaming files over existing files with the same target name. It also doesn't distinguish between .junk. in a directory name vs in a file name - it will just modify the first instance of ".junk." it sees in the pathname. e.g. "./Dir1/My.junk.dir/my.junk.file.txt". Probably not a problem in practice, but something to be aware of. Worse, it will break if any filenames contain whitespace characters (newlines, tabs, spaces, etc - all of which are completely valid in filenames - the ONLY characters guaranteed NOT to be in a pathname are / and NUL). And because you're NOT quoting the filenames in your print statement, it will also break if any filenames contains shell metacharacters like ; & > < etc when the output is piped into sh. A simple fix might appear to be to use single-quotes in the print statement - e.g. print("mv -i '$o' '$_'") - but even this will break if a filename contains a single-quote character. Similar for escaped double-quotes. Shell can be very dangerous if you don't quote your arguments properly. Consider, for example, what would happen if there happened to be a file called ";rm --no-preserve-root -rf /;" (or ";sudo rm ;") under /Dir1. That's a fairly extreme example of an obviously malicious filename, but there are plenty of legitimate, seemingly innocuous filenames that WILL cause problems if passed unquoted to the shell. Whitespace and quoting issues in shell are well-known and long-standing, and pretty much inherent to the way the shell parses its command line - the subject of many FAQs and security advisories. It's unfortunately very easy to improperly quote filenames - it's far harder to do correctly and 100% safely than it seems at first glance. For safety, if you were to DIY it with a command like yours above (there are far better alternatives), you should use -print0 with find and the -0 option with perl. In fact, you should use NUL as the separator with ANY program dealing with arbitrary filenames on stdin - most standard tools these days have -0 (or -z or -Z) options for using NUL as the separator, including most of GNU coreutils etc (head, tail, cut, sort, grep, sed, etc. For awk, you can use BEGIN {RS="\0"} or similar). Also: 1. perl has a built-in rename function, there's no need to fork mv (which would be extremely slow if there are lots of files to rename). And perl isn't shell, so doesn't have problems with unquoted whitespace or shell metacharacters in the filenames. Still doesn't protect against clobbering existing filenames without some extra code, though: $ perldoc -f rename rename OLDNAME,NEWNAME Changes the name of a file; an existing file NEWNAME will be clobbered. Returns true for success; on failure returns false and sets $!. Behavior of this function varies wildly depending on your system implementation. For example, it will usually not work across file system boundaries, even though the system *mv* command sometimes compensates for this. Other restrictions include whether it works on directories, open files, or pre-existing files. Check perlport and either the rename(2) manpage or equivalent system documentation for details. For a platform independent "move" function look at the File::Copy module. Portability issues: "rename" in perlport. 2. Even better, a perl rename utility (aka file-rename, perl-rename, prename, etc as mentioned in my previous message in this thread) already exists and won't overwrite existing files unless you force it to with the -f option. It also distinguishes between directories and file names (by default, it will only rename the filename portion of a pathname unless you use the --path or --fullpath option). It can take filenames from stdin (and has a -0 option for NUL-separated filenames) or as command-line args (e.g. with 'find ... -exec rename {} +') craig ___ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au
Re: Change a file name - remove a consistent string recursively
On Thu, Jan 12, 2023 at 05:49:13PM +1000, Piers Rowan wrote: > I have a structure like: > > /Dir1/123.junk.doc > /Dir1/456.junk.pdf > /Dir1/SubDir/1123.junk.doc > /Dir1/SubDir/1456.junk.pdf > /Dir2/SubDir/4321.junk.doc > /Dir2/SubDir/7676.junk.pdf > ...etc... > > I want some guidance as to how to make: > > 1123.junk.doc > 1123.doc > > $ID.junk.$EXT > $ID.$EXT Using find and the perl rename utility (which is not the same as the rename program in util-linux - that has completely different and incompatible command line options): find /Dir1/ -type f -exec rename -n 's/\.junk\././' {} + That's a dry-run, it will only print what **would** be renamed, without actually doing it. Once you've confirmed that it's going to do what you want, run it without -n, or change -n to -v for verbose operation. Optionally add a `g` regex modifier to the s/// operation ('s/\.junk\././g') if filenames might contain .junk. more than once) perl rename allows you to use **any** perl code to rename files - from simple sed-like regex transformations like the one above to quite complex scripts (it's pretty simple to use sprintf to, say, zero-pad numbers in filenames so that they sort correctly with just a plain numeric sort rather than a natural sort). Depending on your distro, the perl rename command might be rename, prename, file-rename, or perl-rename. Try running each of them to find out what it's called on your system. On Debian and related distros it's in the `rename` package and (via the /etc/alternatives system is executed as just "rename"): Package: rename Version: 2.00-1 Installed-Size: 57 Maintainer: Debian Perl Group Architecture: all Depends: perl:any Description-en: Perl extension for renaming multiple files This package provides both a perl interface for renaming files (File::Rename) and a command line tool 'file-rename' which is intended to replace the version that used to be supplied by the perl package. You can confirm which variant of rename you have installed with the -V option, which works for both perl rename and util-linux rename: If you have the perl version installed, it will mention either perl or File::Rename depending on how old your version is. $ rename -V /usr/bin/rename using File::Rename version 2.00, File::Rename::Options version 1.99 With the util-linux version, it will mention util-linux: $ rename -V rename.ul from util-linux 2.38.1 WARNING: Again, these two programs are not at all compatible. Aside from -V, you can't use perl rename options with util-linux rename or vice-versa. (Debian systems often have both installed, with perl as /usr/bin/rename and util-linux rename as /usr/bin/rename.ul. Other distros might have util-linux as rename and perl rename as prename) craig ___ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au
Re: Change a file name - remove a consistent string recursively
Hi Piers, On Thu, Jan 12, 2023, at 18:49, Piers Rowan via luv-main wrote: ... > I fell like I've asked this before or something similar. > > I have a structure like: > > /Dir1/123.junk.doc > /Dir1/456.junk.pdf > /Dir1/SubDir/1123.junk.doc > /Dir1/SubDir/1456.junk.pdf > /Dir2/SubDir/4321.junk.doc > /Dir2/SubDir/7676.junk.pdf > ...etc... > > I want some guidance as to how to make: > > 1123.junk.doc > 1123.doc > > $ID.junk.$EXT > $ID.$EXT > > Your thoughts are appreciated. ... I see you've already got a good-enough solution to this, but in case you ever need to do something similar in future: I'd do something like: find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") if $_ ne $o;' That is: 1. The find gives you a list of all the file names under /Dir1 2. The -lne to perl makes it (-e) run the given expression (quoted in single quotes to protect shell metacharacters) on (-n) every input line (that is, every file name), stripping off trailing newlines on input, and putting them back on on output (-l). 3. The perl code saves the current line $_ into a variable $o (for "old" or "original"), then does a substitution on the current line (implicitly $_). In the regular expression, plain dot is a match-anything metacharacter, so it needs to be backslash escaped to match a literal dot. So we have the old and new versions of the file name in those respective variables. 4. Instead of doing the renaming immediately, the perl code outputs a line-by-line list of shell mv commands to do the actual renaming (with -i for interaction if it'd try to over-write an existing file). That way, you can inspect the list of shell commands, and check that they're doing the right thing before committing. Good for debugging. 5. Notice that the mv command is emitted only if the substitution actually made a change, so files that don't match won't be affected. 6. Once you're happy that the shell mv commands would do the right thing, you can make it happen by piping the output of that above command pipeline into sh, like: find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") if $_ ne $o;' | sh Notes: - Being too lazy to set up a test file structure, I haven't actually tried out the above. But I've done similar things many times. The main thing is that you can inspect the list of commands output, so you can verify yourself that they look right before running them. And even with that, it'd still be a good idea to do this on a copy of your actual data (or have a backup of it). - This is a very general strategy: Instead of writing a script to actually do something (which might go horribly wrong); you write a script that emits a simple list of commands, which can be visually inspected before being piped into the shell to be executed. - The perl 's' "substitute" operator in its plain form will substitute only the first occurrence it sees. The above assumes this will be only in the filename parts. It's a bit more complicated if that string you want replaced can also occur along the directory path. - The find will list the files in whatever order it encounters them in the filesystem. If you want them in some sane order, you can insert the sort command into the pipeline. - This is finding only ordinary files, it won't see symbolic links. If you have symbolic links in your directory hierarchy, then you'll need to decide what you want to do with them, and put in additional options to find to achieve that. But I don't this applies in your case. - This will come unstuck if your filenames might contain spaces or other whitespace. To handle this, you'll need to put (say) double quotes around the filenames in the generated commands, by putting suitable backslashed-escaped double quotes into the output string, something like: "mv -i \"$o\" \"$_\"" (The good Plan 9 people avoided this Unix shell-quoting hell in their shell.) - If your filenames might contain more exotic things, like newlines or dollar signs, then you'll need to do stuff like work with null-byte-terminated lines (by -print0 on find and the corresponding options to perl, which I don't off the top of my head remember). And concatenation contrivances to get single quotes into the string printed by perl. But by then you're better off writing a script (in perl or your favorite scripting language), rather than trying to do it all in a one-liner, with all its escaping weirdness. I hope this is of some use. — Smiles, Les. ___ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au
Re: Change a file name - remove a consistent string recursively
On 12/1/23 18:50, David via luv-main wrote: Hi, guidance as requested: I assume you're seeking a commandline solution, not a GUI one. I only had a few directories so I just ran this a few times: rename ".original" "" */* The folder structure was /path/to/data//MM so cd /path/to/data// rename ".original" "" */* removed the word original from the file. Thanks David. Cheers P ___ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au
Re: Change a file name - remove a consistent string recursively
On Thu, 12 Jan 2023 at 18:49, Piers Rowan via luv-main wrote: > I have a structure like: > > /Dir1/123.junk.doc > /Dir1/456.junk.pdf > /Dir1/SubDir/1123.junk.doc > /Dir1/SubDir/1456.junk.pdf > /Dir2/SubDir/4321.junk.doc > /Dir2/SubDir/7676.junk.pdf > ...etc... > > I want some guidance as to how to make: > > 1123.junk.doc > 1123.doc > > $ID.junk.$EXT > $ID.$EXT Hi, guidance as requested: I assume you're seeking a commandline solution, not a GUI one. 1) Write a shell script that recurses into all subdirectories, finds matching filenames, 'mv' to new name. Bash shell provides all the tools necessary. Some ideas here: http://mywiki.wooledge.org/BashFAQ/030 Could be done with a recursive function. Or, there's probably some shorter approach if using the enhanced globbing features. 2) Or, search the web for such a script. 3) Or, use 'find' to detect all subdirectors and in each one invoke the 'rename' commandline tool, which takes Perl 'substitute' command argument and applies that to matching filenames. See: https://manpages.debian.org/bullseye/rename/rename.1.en.html 4) There's probably examples of doing that on the web too, findable with appropriate keywords. 5) Search a site like this: https://superuser.com/search?q=linux+recursive+rename ___ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au
Change a file name - remove a consistent string recursively
Hi champions, I fell like I've asked this before or something similar. I have a structure like: /Dir1/123.junk.doc /Dir1/456.junk.pdf /Dir1/SubDir/1123.junk.doc /Dir1/SubDir/1456.junk.pdf /Dir2/SubDir/4321.junk.doc /Dir2/SubDir/7676.junk.pdf ...etc... I want some guidance as to how to make: 1123.junk.doc > 1123.doc $ID.junk.$EXT > $ID.$EXT Your thoughts are appreciated. Thanks Piers ___ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au