While that's the long-term (end of next week ...) goal, the immediate concern is to find a reliable script to separate the painfully obvious IPv4 addresses, not from the bodies of the PTR's, but from the list of them.

Still no expected output...

If the so-called "painfully obvious IPv4 addresses" are those that my last sed's substitution extract, then you can just 'grep' and/or 'grep -v' the output, using the regular expression you found on https://superuser.com/questions/202818/what-regular-expression-can-i-use-to-match-an-ip-address

If you want one single AWK program to do everything (maybe faster, maybe not):

$ awk '{ $0 = gensub(/[^0-9]*([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3}).*/, "\\1.\\2.\\3.\\4", "1"); if ($0 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) print $0 >> "IPv4_found"; else print $0 >> "IPv4_not_found" }' IPv4-SourceList.txt

I suspect you do not consider the substitution I wrote "painfully obvious". Maybe you want the separator to always be one single character (but not necessarily always the same). If so, you can modify the substitution accordingly, as I explained in my first post in this thread (for IPv6 addresses). The single-AWK-program solution becomes:

$ awk '{ $0 = gensub(/[^0-9]*([0-9]{1,3})[^0-9]([0-9]{1,3})[^0-9]([0-9]{1,3})[^0-9]([0-9]{1,3}).*/, "\\1.\\2.\\3.\\4", "1"); if ($0 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) print $0 >> "IPv4_found"; else print $0 >> "IPv4_not_found" }' IPv4-SourceList.txt

Fewer addresses end up in IPv4_found. On the positive side, it becomes less likely that it contains wrong addresses: the output is correct when the PTR contains digits before the actual address, but these digits are separated by more than one one character (against five in the previous solution).

If you want to keep the association with the PTR, then do not overwrite $0, and print it along the address when found. For the first AWK program above:

$ awk '{ addr = gensub(/[^0-9]*([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3})[^0-9]{1,5}([0-9]{1,3}).*/, "\\1.\\2.\\3.\\4", "1") } { if (addr ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) print $0, addr >> "IPv4_found"; else print $0 >> "IPv4_not_found" }' IPv4-SourceList.txt

With sed (whose output can be grepped, I repeat), you can just paste the input file with its output:

$ sed 's/[^0-9]*\([0-9]\{1,3\}\)[^0-9]\{1,5\}\([0-9]\{1,3\}\)[^0-9]\{1,5\}\([0-9]\{1,3\}\)[^0-9]\{1,5\}\([0-9]\{1,3\}\).*/\1.\2.\3.\4/' IPv4-SourceList.txt | paste IPv4-SourceList.txt -

Reply via email to