Hello!
What does this code from HTML.pm module:
  if ($self->{last_text}) {
    # ideas discarded since they would be easy to evade:
    # 1. using \w or [A-Za-z] instead of \S or non-punctuation
    # 2. exempting certain tags
    if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
        $self->{last_text} =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
    {
      $self->{html}{obfuscation}++;
    }
    if ($self->{last_text} =~
        /\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
    {
      my $start = length($1);
      if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s) {
        my $backhair = $start . "_" . length($1);
        $self->{html}{backhair}->{$backhair}++;
        $self->{html}{backhair_count} = keys %{ $self->{html}{backhair} };
      }
    }
  }

I'm debugging my unicode patch for SpamAssassin and this one of the
places which I think may need rewriting because it probably doesn't
support unicode input.

-- 
Email: eugene @ renice.org

Reply via email to