In preparation for applying Horrible sed's in their various forms, I set
about the task of extracting
the IPv6 and IPv4 addresses from my list of gratuitously looked up hostnames
gleaned from nearly two
hundred sets of publicly available recent visitor data. It's a long list, way
too big to use my Libre
Office Calc. crutches.
Separate the IPv6's from the GLU hostnames from which the IPv4's have just
been removed:
grep ":" SourceFile.txt | sort | uniq -c | awk '{print $2}' '-' >
IPv6-List.txt
It's difficult to use comm; I used the invert-match option in grep instead:
grep -v ":" SourceFile.txt | sort | uniq -c | awk '{print $2}' '-' >
NoIPv6-List.txt
The NoIVp6-List.txt still has a lot of not-looked-up IPv4 addresses.
Ref.:
https://superuser.com/questions/202818/what-regular-expression-can-i-use-to-match-an-ip-address
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /etc/hosts
Applied here:
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' NoIPv6-List.txt >
IPv4-List.txt
Alas, this IPv4-List file includes many addresses extracted from the
hostnames where '.' is used as
the separator. I'll be doing that with a sed script later, but now I want the
IPv4's that were not
gratuitously looked up (because they couldn't be ?). Reversing grep would
erase those legitimate PTR's.
Is there sed-way of doing this separation?
George Langford