While that's the long-term (end of next week ...) goal, the immediate concern
is to find a reliable script to separate the painfully obvious IPv4
addresses, not from the bodies of the PTR's, but from the list of them.
Still no expected output...
If the so-called "painfully obvious IPv4 addresses" are those that my last
sed's substitution extract, then you can just 'grep' and/or 'grep -v' the
output, using the regular expression you found on
https://superuser.com/questions/202818/what-regular-expression-can-i-use-to-match-an-ip-address
If you want one single AWK program to do everything (maybe faster, maybe
not):
$ awk '{ $0 =
gensub(/[^0-9]*([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3}).*/,
"\\1.\\2.\\3.\\4", "1"); if ($0 ~
/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) print $0 >> "IPv4_found";
else print $0 >> "IPv4_not_found" }' IPv4-SourceList.txt
I suspect you do not consider the substitution I wrote "painfully obvious".
Maybe you want the separator to always be one single character (but not
necessarily always the same). If so, you can modify the substitution
accordingly, as I explained in my first post in this thread (for IPv6
addresses). The single-AWK-program solution becomes:
$ awk '{ $0 =
gensub(/[^0-9]*([0-9]{1,3})[^0-9]([0-9]{1,3})[^0-9]([0-9]{1,3})[^0-9]([0-9]{1,3}).*/,
"\\1.\\2.\\3.\\4", "1"); if ($0 ~
/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) print $0 >> "IPv4_found";
else print $0 >> "IPv4_not_found" }' IPv4-SourceList.txt
Fewer addresses end up in IPv4_found. On the positive side, it becomes less
likely that it contains wrong addresses: the output is correct when the PTR
contains digits before the actual address, but these digits are separated by
more than one one character (against five in the previous solution).
If you want to keep the association with the PTR, then do not overwrite $0,
and print it along the address when found. For the first AWK program above:
$ awk '{ addr =
gensub(/[^0-9]*([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3}).*/,
"\\1.\\2.\\3.\\4", "1") } { if (addr ~
/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) print $0, addr >>
"IPv4_found"; else print $0 >> "IPv4_not_found" }' IPv4-SourceList.txt
With sed (whose output can be grepped, I repeat), you can just paste the
input file with its output:
$ sed
's/[^0-9]*\([0-9]\{1,3\}\)[^0-9]\{1,5\}\([0-9]\{1,3\}\)[^0-9]\{1,5\}\([0-9]\{1,3\}\)[^0-9]\{1,5\}\([0-9]\{1,3\}\).*/\1.\2.\3.\4/'
IPv4-SourceList.txt | paste IPv4-SourceList.txt -