Re: Change a file name - remove a consistent string recursively

2023-01-16 Thread Craig Sanders via luv-main
On Tue, Jan 17, 2023 at 12:01:35AM +1100, Les Kitchen wrote:
> On Mon, Jan 16, 2023, at 21:42, Craig Sanders via luv-main wrote:
> > On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote:
> >> I'd do something like:
> >>
> >> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") 
> >> if $_ ne $o;'
>
> Thanks, Craig, for your followup.
>
> > This is quite dangerous, for several reasons.  To start with, there's no
> > protection against renaming files over existing files with the same target
> > name.
> ...
>
> Well, that's the intention of the -i (interactive) option to mv,
> to require user agreement before over-writing existing files.

Which gets tedious real fast with more than a few files to confirm.


> All the other points you raise are valid, especially the dangers of feeding
> unchecked input into the shell, and anybody writing shell code needs to be
> aware of them — although I will say I mentioned pretty much all of them in
> the notes further down in my message, though in less detail than you have,
> and without stressing enough the dangers.

Yeah, i noticed that - I just thought it needed to be emphasised and explained
in more detail. These issues are the source of a lot of really serious bugs in
shell scripts & one-liners.


> And, yes, if you have filenames with arbitrary characters, then you have to
> resort to other contrivances, ultimately to NULL-terminated strings.

Using NUL as the separator isn't a "contrivance".  It's standard good practice
- use a delimiter that ISN'T (and preferably CAN'T be) in your input.  Since
NUL is the only character that can't be in a path/filename, that's the only
character to use.  It works whether you've got annoying characters in the
filenames or not. No special cases or special handling required. It just
works.


> And, yes, if you have a huge number of files, then you'd likely want to do
> the rename internal to your scripting language, instead of forking a new
> process for each file rename.  But then you lose the easy ability to review
> the commands before they're executed.

It's not difficult to change a print() statement to a rename() statement, or
to have both and comment out the rename until you've verified the output (i.e.
a simple "dry-run").

> And I could also mention the potential for filenames to contain UTF-8
> (or other encodings) for characters that just happen to look like ASCII
> characters, but aren't, or to contain terminal-control escape sequences.  It
> can get very weird.

While there's a handful of problematic unicode characters (mostly the extra
whitespace characters), in general unicode is not a problem. Especially if
you use NUL and/or proper quoting and/or arrays (e.g. `find` in combination
with the bash/ksh/zsh builtin mapfile/readarray and process substitution is
extremely useful - mapfile also supports NUL as the delimiter, another great
method of eliminating whitespace & quoting bugs).

> In general, there's a big big difference between a simple shell one-liner
> that you use as a work amplifier in situations you know are well-behaved,
> and a piece of robust code that can behave gracefully no matter what weird
> input is thrown at it.  They're different use-cases.

It's not hard to write robust one-liners. It just takes practice - a matter
of developing good habits and stomping on bad habits until it's automatic.

And using tools like shellcheck to highlight common mistakes and bad
practices helps a lot - it's available as a command-line tool and as a
paste-your-code-here web service. https://www.shellcheck.net/

It's packaged for debian and probably most other distros and is, IMO,
essential for any shell user, even if (especially if!) you're just dabbling
with the simplest of shell scripts or one-liners.  I wish it had been around
when I was learning shell - I look at some of the shell code I wrote years ago
and just shudder at how awful it is.  I got better with practice, though :) I
made a lot of those mistakes because I simply didn't know they were mistakes,
didn't know how dangerous they were, didn't know any better at the time.
shellcheck solves that problem.

Package: shellcheck
Description-en: lint tool for shell scripts
 The goals of ShellCheck are:
 .
  * To point out and clarify typical beginner's syntax issues,
that causes a shell to give cryptic error messages.
 .
  * To point out and clarify typical intermediate level semantic problems,
that causes a shell to behave strangely and counter-intuitively.
 .
  * To point out subtle caveats, corner cases and pitfalls, that may cause an
advanced user's otherwise working script to fail under future circumstances.


Hastily written one-liners often lead to questions like "WTF happened to my
data?", "How can I reverse this 'sed -i' command I just ran?", and "Is it
possible to undelete files on ext4?"

> > Worse, it will break if any filenames contain whitespace characters 
> > (newlines,
> > tabs, spaces, etc - all of which are 

Re: Change a file name - remove a consistent string recursively

2023-01-16 Thread Les Kitchen via luv-main
On Mon, Jan 16, 2023, at 21:42, Craig Sanders via luv-main wrote:
> On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote:
>> I'd do something like:
>>
>> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") 
>> if $_ ne $o;'

Thanks, Craig, for your followup.

> This is quite dangerous, for several reasons.  To start with, there's no
> protection against renaming files over existing files with the same target
> name.
...

Well, that's the intention of the -i (interactive) option to mv,
to require user agreement before over-writing existing files.

All the other points you raise are valid, especially the dangers
of feeding unchecked input into the shell, and anybody writing
shell code needs to be aware of them — although I will say I
mentioned pretty much all of them in the notes further down in
my message, though in less detail than you have, and without
stressing enough the dangers.

If you know you've got a modest number of files, with
well-behaved file names (in the Unix-shell sense, that is, no
whitespace, no shell metacharacters, etc.), no changes to the
directory structure, then the approach I suggested can work
quite well.  And one big advantage is that you can review the
generated list of shell commands, and check that they'll do what
you expect before committing to executing them.

And, yes, if you have filenames with arbitrary characters, then
you have to resort to other contrivances, ultimately to
NULL-terminated strings.  And, yes, if you have a huge number of
files, then you'd likely want to do the rename internal to your
scripting language, instead of forking a new process for each
file rename.  But then you lose the easy ability to review the
commands before they're executed.  Of course there are
workarounds.  You could define, say, a Perl routine that in one
definition just prints the names in the file renaming, and in
another definition, actually does the rename  But then you're
getting well beyond the simple one-liner (unless you use that
Perl rename program you mention later).

And I could also mention the potential for filenames to contain
UTF-8 (or other encodings) for characters that just happen to
look like ASCII characters, but aren't, or to contain
terminal-control escape sequences.  It can get very weird.

In general, there's a big big difference between a simple shell
one-liner that you use as a work amplifier in situations you
know are well-behaved, and a piece of robust code that can
behave gracefully no matter what weird input is thrown at it.
They're different use-cases.

> It also doesn't distinguish between .junk. in a directory name vs in a file
> name - it will just modify the first instance of ".junk." it sees in the
> pathname. e.g. "./Dir1/My.junk.dir/my.junk.file.txt".  Probably not a problem
> in practice, but something to be aware of.

Yeah.

> Worse, it will break if any filenames contain whitespace characters (newlines,
> tabs, spaces, etc - all of which are completely valid in filenames - the ONLY
> characters guaranteed NOT to be in a pathname are / and NUL).

This should be taped to the screen of every shell user.
Actually, files with spaces are pretty common in non-Unix
environments, like Windows or MacOS (yes, I know it's Unix
underneath), but they're pretty simple to handle by
double-quoting, as I mentioned in my notes — and that will
handle pretty much everything except for characters that
interpolate into double-quoted strings, I guess $ ` (backtick),
and possibly !.

> And because you're NOT quoting the filenames in your print statement, it
> will also break if any filenames contains shell metacharacters like ; & > <
> etc when the output is piped into sh. A simple fix might appear to be to use
> single-quotes in the print statement - e.g. print("mv -i '$o' '$_'") - but
> even this will break if a filename contains a single-quote character. Similar
> for escaped double-quotes.

It's even messier than this.  Because you're already using
single quotes for the -e expression to Perl, you can't
immediately use them like that.  You have to do something like
'"'"' to close the single-quoted string, then attach a
double-quoted single quote, then open a new single-quoted
string.  I don't even want to think about it.  By then you might
as well shift to using NULL-terminated strings.

> Shell can be very dangerous if you don't quote your arguments properly.
> Consider, for example, what would happen if there happened to be a file called
> ";rm --no-preserve-root -rf /;" (or ";sudo rm ;") under /Dir1.  That's
> a fairly extreme example of an obviously malicious filename, but there are
> plenty of legitimate, seemingly innocuous filenames that WILL cause problems
> if passed unquoted to the shell.
>
> Whitespace and quoting issues in shell are well-known and long-standing,
> and pretty much inherent to the way the shell parses its command line - the
> subject of many FAQs and security advisories.
>
> It's unfortunately very easy to 

Re: Change a file name - remove a consistent string recursively

2023-01-16 Thread Craig Sanders via luv-main
On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote:
> I'd do something like:
>
> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") if 
> $_ ne $o;'

This is quite dangerous, for several reasons.  To start with, there's no
protection against renaming files over existing files with the same target
name.

It also doesn't distinguish between .junk. in a directory name vs in a file
name - it will just modify the first instance of ".junk." it sees in the
pathname. e.g. "./Dir1/My.junk.dir/my.junk.file.txt".  Probably not a problem
in practice, but something to be aware of.

Worse, it will break if any filenames contain whitespace characters (newlines,
tabs, spaces, etc - all of which are completely valid in filenames - the ONLY
characters guaranteed NOT to be in a pathname are / and NUL).

And because you're NOT quoting the filenames in your print statement, it
will also break if any filenames contains shell metacharacters like ; & > <
etc when the output is piped into sh. A simple fix might appear to be to use
single-quotes in the print statement - e.g. print("mv -i '$o' '$_'") - but
even this will break if a filename contains a single-quote character. Similar
for escaped double-quotes.

Shell can be very dangerous if you don't quote your arguments properly.
Consider, for example, what would happen if there happened to be a file called
";rm --no-preserve-root -rf /;" (or ";sudo rm ;") under /Dir1.  That's
a fairly extreme example of an obviously malicious filename, but there are
plenty of legitimate, seemingly innocuous filenames that WILL cause problems
if passed unquoted to the shell.

Whitespace and quoting issues in shell are well-known and long-standing,
and pretty much inherent to the way the shell parses its command line - the
subject of many FAQs and security advisories.

It's unfortunately very easy to improperly quote filenames - it's far harder
to do correctly and 100% safely than it seems at first glance.

For safety, if you were to DIY it with a command like yours above (there are
far better alternatives), you should use -print0 with find and the -0 option
with perl.

In fact, you should use NUL as the separator with ANY program dealing with
arbitrary filenames on stdin - most standard tools these days have -0 (or
-z or -Z) options for using NUL as the separator, including most of GNU
coreutils etc (head, tail, cut, sort, grep, sed, etc. For awk, you can use
BEGIN {RS="\0"} or similar).

Also:

1. perl has a built-in rename function, there's no need to fork mv (which
would be extremely slow if there are lots of files to rename).  And perl
isn't shell, so doesn't have problems with unquoted whitespace or shell
metacharacters in the filenames.  Still doesn't protect against clobbering
existing filenames without some extra code, though:

$ perldoc -f rename
rename OLDNAME,NEWNAME
Changes the name of a file; an existing file NEWNAME will be
clobbered. Returns true for success; on failure returns false
and sets $!.

Behavior of this function varies wildly depending on your system
implementation. For example, it will usually not work across
file system boundaries, even though the system *mv* command
sometimes compensates for this. Other restrictions include
whether it works on directories, open files, or pre-existing
files. Check perlport and either the rename(2) manpage or
equivalent system documentation for details.

For a platform independent "move" function look at the
File::Copy module.

Portability issues: "rename" in perlport.

2. Even better, a perl rename utility (aka file-rename, perl-rename, prename,
etc as mentioned in my previous message in this thread) already exists and
won't overwrite existing files unless you force it to with the -f option.

It also distinguishes between directories and file names (by default, it will
only rename the filename portion of a pathname unless you use the --path or
--fullpath option).  It can take filenames from stdin (and has a -0 option for
NUL-separated filenames) or as command-line args (e.g. with 'find ... -exec
rename  {} +')

craig
___
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au


Re: Change a file name - remove a consistent string recursively

2023-01-16 Thread Craig Sanders via luv-main
On Thu, Jan 12, 2023 at 05:49:13PM +1000, Piers Rowan wrote:
> I have a structure like:
>
> /Dir1/123.junk.doc
> /Dir1/456.junk.pdf
> /Dir1/SubDir/1123.junk.doc
> /Dir1/SubDir/1456.junk.pdf
> /Dir2/SubDir/4321.junk.doc
> /Dir2/SubDir/7676.junk.pdf
> ...etc...
>
> I want some guidance as to how to make:
>
> 1123.junk.doc > 1123.doc
>
> $ID.junk.$EXT > $ID.$EXT

Using find and the perl rename utility (which is not the same as the rename
program in util-linux - that has completely different and incompatible command
line options):

find /Dir1/ -type f -exec rename -n 's/\.junk\././' {} +

That's a dry-run, it will only print what **would** be renamed, without
actually doing it.  Once you've confirmed that it's going to do what you want,
run it without -n, or change -n to -v for verbose operation.

Optionally add a `g` regex modifier to the s/// operation ('s/\.junk\././g')
if filenames might contain .junk. more than once)

perl rename allows you to use **any** perl code to rename files - from simple
sed-like regex transformations like the one above to quite complex scripts
(it's pretty simple to use sprintf to, say, zero-pad numbers in filenames so
that they sort correctly with just a plain numeric sort rather than a natural
sort).



Depending on your distro, the perl rename command might be rename, prename,
file-rename, or perl-rename. Try running each of them to find out what it's
called on your system.

On Debian and related distros it's in the `rename` package and (via the
/etc/alternatives system is executed as just "rename"):

Package: rename
Version: 2.00-1
Installed-Size: 57
Maintainer: Debian Perl Group 
Architecture: all
Depends: perl:any
Description-en: Perl extension for renaming multiple files
 This package provides both a perl interface for renaming files (File::Rename)
 and a command line tool 'file-rename' which is intended to replace the version
 that used to be supplied by the perl package.




You can confirm which variant of rename you have installed with the -V option,
which works for both perl rename and util-linux rename:

If you have the perl version installed, it will mention either perl or
File::Rename depending on how old your version is.

$ rename -V
/usr/bin/rename using File::Rename version 2.00, File::Rename::Options version 
1.99

With the util-linux version, it will mention util-linux:

$ rename -V
rename.ul from util-linux 2.38.1

WARNING: Again, these two programs are not at all compatible. Aside from -V,
you can't use perl rename options with util-linux rename or vice-versa.


(Debian systems often have both installed, with perl as /usr/bin/rename and
util-linux rename as /usr/bin/rename.ul. Other distros might have util-linux
as rename and perl rename as prename)

craig
___
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au


Re: Change a file name - remove a consistent string recursively

2023-01-13 Thread Les Kitchen via luv-main
Hi Piers,

On Thu, Jan 12, 2023, at 18:49, Piers Rowan via luv-main wrote:
...
> I fell like I've asked this before or something similar.
>
> I have a structure like:
>
> /Dir1/123.junk.doc
> /Dir1/456.junk.pdf
> /Dir1/SubDir/1123.junk.doc
> /Dir1/SubDir/1456.junk.pdf
> /Dir2/SubDir/4321.junk.doc
> /Dir2/SubDir/7676.junk.pdf
> ...etc...
>
> I want some guidance as to how to make:
>
> 1123.junk.doc > 1123.doc
>
> $ID.junk.$EXT > $ID.$EXT
>
> Your thoughts are appreciated.
...

I see you've already got a good-enough solution to this, but in
case you ever need to do something similar in future:

I'd do something like:

find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") if 
$_ ne $o;'

That is:

1. The find gives you a list of all the file names under /Dir1
2. The -lne to perl makes it (-e) run the given expression (quoted in
   single quotes to protect shell metacharacters) on (-n) every input
   line (that is, every file name), stripping off trailing
   newlines on input, and putting them back on on output (-l).
3. The perl code saves the current line $_ into a variable $o
   (for "old" or "original"), then does a substitution on the
   current line (implicitly $_).  In the regular expression,
   plain dot is a match-anything metacharacter, so it needs to
   be backslash escaped to match a literal dot. So we have the
   old and new versions of the file name in those respective
   variables.
4. Instead of doing the renaming immediately, the perl code
   outputs a line-by-line list of shell mv commands to do the
   actual renaming (with -i for interaction if it'd try to
   over-write an existing file).  That way, you can inspect the
   list of shell commands, and check that they're doing the
   right thing before committing.  Good for debugging.
5. Notice that the mv command is emitted only if the
   substitution actually made a change, so files that don't
   match won't be affected.
6. Once you're happy that the shell mv commands would do the
   right thing, you can make it happen by piping the output of
   that above command pipeline into sh, like:
   find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") 
if $_ ne $o;' | sh

Notes:

- Being too lazy to set up a test file structure, I haven't
  actually tried out the above.  But I've done similar things
  many times.  The main thing is that you can inspect the list
  of commands output, so you can verify yourself that they look
  right before running them.  And even with that, it'd still be
  a good idea to do this on a copy of your actual data (or have
  a backup of it).
- This is a very general strategy: Instead of writing a script
  to actually do something (which might go horribly wrong); you
  write a script that emits a simple list of commands, which can
  be visually inspected before being piped into the shell to be
  executed.
- The perl 's' "substitute" operator in its plain form will
  substitute only the first occurrence it sees.  The above
  assumes this will be only in the filename parts.  It's a bit
  more complicated if that string you want replaced can also
  occur along the directory path.
- The find will list the files in whatever order it encounters
  them in the filesystem.  If you want them in some sane order,
  you can insert the sort command into the pipeline.
- This is finding only ordinary files, it won't see symbolic
  links.  If you have symbolic links in your directory
  hierarchy, then you'll need to decide what you want to do with
  them, and put in additional options to find to achieve that.
  But I don't this applies in your case.
- This will come unstuck if your filenames might contain spaces
  or other whitespace.  To handle this, you'll need to put
  (say) double quotes around the filenames in the generated
  commands, by putting suitable backslashed-escaped double
  quotes into the output string, something like:

  "mv -i \"$o\" \"$_\""

  (The good Plan 9 people avoided this Unix shell-quoting hell in
  their shell.)
- If your filenames might contain more exotic things, like
  newlines or dollar signs, then you'll need to do stuff like
  work with null-byte-terminated lines (by -print0 on find and
  the corresponding options to perl, which I don't off the top
  of my head remember).  And concatenation contrivances to get
  single quotes into the string printed by perl.  But by then
  you're better off writing a script (in perl or your favorite
  scripting language), rather than trying to do it all in a
  one-liner, with all its escaping weirdness.


I hope this is of some use.


— Smiles, Les.
___
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au


Re: Change a file name - remove a consistent string recursively

2023-01-12 Thread Piers Rowan via luv-main



On 12/1/23 18:50, David via luv-main wrote:


Hi, guidance as requested:

I assume you're seeking a commandline solution, not
a GUI one.


I only had a few directories so I just ran this a few times:

rename ".original" "" */*

The folder structure was /path/to/data//MM so

cd /path/to/data//

rename ".original" "" */*

removed the word original from the file.

Thanks David.

Cheers

P

___
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au


Re: Change a file name - remove a consistent string recursively

2023-01-12 Thread David via luv-main
On Thu, 12 Jan 2023 at 18:49, Piers Rowan via luv-main
 wrote:

> I have a structure like:
>
> /Dir1/123.junk.doc
> /Dir1/456.junk.pdf
> /Dir1/SubDir/1123.junk.doc
> /Dir1/SubDir/1456.junk.pdf
> /Dir2/SubDir/4321.junk.doc
> /Dir2/SubDir/7676.junk.pdf
> ...etc...
>
> I want some guidance as to how to make:
>
> 1123.junk.doc > 1123.doc
>
> $ID.junk.$EXT > $ID.$EXT

Hi, guidance as requested:

I assume you're seeking a commandline solution, not
a GUI one.

1) Write a shell script that recurses into all subdirectories,
finds matching filenames, 'mv' to new name.
Bash shell provides all the tools necessary.
Some ideas here:
  http://mywiki.wooledge.org/BashFAQ/030
Could be done with a recursive function. Or, there's probably
some shorter approach if using the enhanced globbing features.

2) Or, search the web for such a script.

3) Or, use 'find' to detect all subdirectors and in each one
invoke the 'rename' commandline tool, which takes Perl
'substitute' command argument and applies that to matching
filenames. See:
  https://manpages.debian.org/bullseye/rename/rename.1.en.html

4) There's probably examples of doing that on the web too,
findable with appropriate keywords.

5) Search a site like this:
  https://superuser.com/search?q=linux+recursive+rename
___
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au


Change a file name - remove a consistent string recursively

2023-01-11 Thread Piers Rowan via luv-main

Hi champions,

I fell like I've asked this before or something similar.

I have a structure like:

/Dir1/123.junk.doc
/Dir1/456.junk.pdf
/Dir1/SubDir/1123.junk.doc
/Dir1/SubDir/1456.junk.pdf
/Dir2/SubDir/4321.junk.doc
/Dir2/SubDir/7676.junk.pdf
...etc...

I want some guidance as to how to make:

1123.junk.doc > 1123.doc

$ID.junk.$EXT > $ID.$EXT

Your thoughts are appreciated.

Thanks

Piers

___
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au