Okay, was tinkering with the code below but the zero-width lookahead is not disqualifying ampersand followed by #x[0-9A-F]{4}; so the output is bogus (you can run this and see what I mean).
What am I doing wrong? #!/usr/bin/perl -w use warnings; use strict; my $data = <<__EOF__; Thе Rеаl RеаѕоnThе Ꮯоmіng Ꮯоllарѕе...Thе rеаl rеаѕоn ᎳHY HоmеlаndSеcurіtу rеcеntlу рurchаѕеd1.7 Bіllіоn Rоundѕ оf аmmunіtіоn...Ꮃhаt Yоu Muѕt Dо Tо Ꭼnѕurе YоurSаfеtуHоmеlаnd ѕеcurіtу іѕ thеrе tо ѕеcurеthе hоmеlаnd оnlу... Sо thеѕе Ьullеtѕаrе rеаlу mеаnt fоr thеThіѕ іѕ аn еmаіlаdvеrtіѕеmеnt thаt wаѕ ѕеnt tо уоu Ьу Ρаtrіоt Survіvаl Ρlаn. If уоuwіѕh tо nоlоngеr rеcеіvе mеѕѕаgеѕ thаt рrоmоtе ѕurvіvаl tірѕ, рlеаѕеclіck hеrе tо unѕuЬѕcrіЬе.4 Unstable as water, thou shalt not excel because thou wentest up to thy fathers bed then defiledst thou it he went up to my couch.34 And Pharaohnechoh made Eliakim the son of Josiah king in the room of Josiah his father, and turned his name to Jehoiakim, and took Jehoahaz away and he came to Egypt, and died there.37 And the thing was good in the eyes of Pharaoh, and in the eyes o! f all his servants. __EOF__ my $chars = 0; my $uchars = 0; for (split("\n", $data)) { print STDERR "line: ", $_, "\n"; my @matches = m/[\001-\045\047-\177]|&(?!#x[0-9A-F]{4};)/g; print STDERR "matches: ", join(',', @matches), " count ", scalar @matches, "\n"; my $chars += scalar @matches; print STDERR "chars: ", $chars, "\n"; @matches = m/&#x[0-9A-F]{4};/g; print STDERR "matches: ", join(',', @matches), " count ", scalar @matches, "\n"; my $uchars += scalar @matches; print STDERR "uchars: ", $uchars, "\n"; print STDERR "\n"; }