In preparation for applying Horrible sed's in their various forms, I set about the task of extracting the IPv6 and IPv4 addresses from my list of gratuitously looked up hostnames gleaned from nearly two hundred sets of publicly available recent visitor data. It's a long list, way too big to use my Libre
Office Calc. crutches.

Separate the IPv6's from the GLU hostnames from which the IPv4's have just been removed: grep ":" SourceFile.txt | sort | uniq -c | awk '{print $2}' '-' > IPv6-List.txt
        
It's difficult to use comm; I used the invert-match option in grep instead:
grep -v ":" SourceFile.txt | sort | uniq -c | awk '{print $2}' '-' > NoIPv6-List.txt

The NoIVp6-List.txt still has a lot of not-looked-up IPv4 addresses.
Ref.: https://superuser.com/questions/202818/what-regular-expression-can-i-use-to-match-an-ip-address
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /etc/hosts

Applied here:
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' NoIPv6-List.txt > IPv4-List.txt

Alas, this IPv4-List file includes many addresses extracted from the hostnames where '.' is used as the separator. I'll be doing that with a sed script later, but now I want the IPv4's that were not gratuitously looked up (because they couldn't be ?). Reversing grep would erase those legitimate PTR's.

Is there  sed-way of doing this separation?

George Langford

Reply via email to