Hello! What does this code from HTML.pm module: if ($self->{last_text}) { # ideas discarded since they would be easy to evade: # 1. using \w or [A-Za-z] instead of \S or non-punctuation # 2. exempting certain tags if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s && $self->{last_text} =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s) { $self->{html}{obfuscation}++; } if ($self->{last_text} =~ /\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s) { my $start = length($1); if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s) { my $backhair = $start . "_" . length($1); $self->{html}{backhair}->{$backhair}++; $self->{html}{backhair_count} = keys %{ $self->{html}{backhair} }; } } }
I'm debugging my unicode patch for SpamAssassin and this one of the places which I think may need rewriting because it probably doesn't support unicode input. -- Email: eugene @ renice.org