http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From [EMAIL PROTECTED]  2004-03-12 13:06 -------
Subject: Re:  Obfuscation FP when obfuscating tag starts line or punctuation 
follows tag.

Hmmm... it seems that [A-Za-z] or \w in there does improve results.  I
somehow botched my testing, but caught it when verifying the check-in
results.  Anyhow, I restored the better/simpler of the regular
expression combinations I tested (15 total) and checked them into SVN.

I'm concerned that [A-Za-z] and \w are too locale-specific, so I'd like
to figure out exactly why they improve results so much over \S.

The tests:

    if ($self->{html_text}[-1] =~ /\S$/s && $text =~ /^\S/s) {
      $self->{html}{obfuscation}++;
    }
    if ($self->{html_text}[-1] =~ /\S\z/s &&
        $text =~ /^\S/s)
    {
      $self->{html}{t_obfuscation1}++;
    }
    if ($self->{html_text}[-1] =~ /\S*[A-Za-z]\S*\z/ &&
        $text =~ /^\S*[A-Za-z]\S*/)
    {
        $self->{html}{t_obfuscation2}++;
    }
    if ($self->{html_text}[-1] =~ /\S*\w\S*\z/ &&
        $text =~ /^\S*\w\S*/)
    {
        $self->{html}{t_obfuscation3}++;
    }

Results:

  9.504  16.1123   3.1377    0.837   0.55    1.00  HTML_OBFUSCATE_00_10
  1.697   3.4165   0.0401    0.988   0.96    1.00  HTML_OBFUSCATE_10_20
  1.731   3.5204   0.0067    0.998   0.99    1.00  HTML_OBFUSCATE_20_30
  1.952   3.9709   0.0067    0.998   0.99    1.00  HTML_OBFUSCATE_30_40
  1.755   3.5759   0.0000    1.000   1.00    1.00  HTML_OBFUSCATE_40_50
  1.211   2.4671   0.0000    1.000   1.00    1.00  HTML_OBFUSCATE_50_60
  0.677   1.3791   0.0000    1.000   1.00    1.00  HTML_OBFUSCATE_60_70
  0.527   1.0742   0.0000    1.000   0.99    1.00  HTML_OBFUSCATE_70_80
  0.071   0.1455   0.0000    1.000   0.99    1.00  HTML_OBFUSCATE_80_90
  0.037   0.0762   0.0000    1.000   0.99    1.00  HTML_OBFUSCATE_90_100

  8.956  15.0243   3.1110    0.828   0.53    0.01  T_HTML_OBFUSCATE1_00_10
  1.659   3.3611   0.0200    0.994   0.98    0.01  T_HTML_OBFUSCATE1_10_20
  1.727   3.5066   0.0134    0.996   0.99    0.01  T_HTML_OBFUSCATE1_20_30
  1.938   3.9501   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE1_30_40
  1.761   3.5897   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE1_40_50
  1.211   2.4671   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE1_50_60
  0.673   1.3721   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE1_60_70
  0.527   1.0742   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE1_70_80
  0.071   0.1455   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE1_80_90
  0.034   0.0693   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE1_90_100

  2.941   5.1421   0.8211    0.862   0.59    0.01  T_HTML_OBFUSCATE2_00_10
  1.496   3.0492   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE2_10_20
  1.884   3.8392   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE2_20_30
  2.105   4.2897   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE2_30_40
  1.381   2.8136   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE2_40_50
  1.224   2.4948   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE2_50_60
  0.677   1.3791   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE2_60_70
  0.394   0.8039   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE2_70_80
  0.051   0.1040   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE2_80_90
  0.340   0.6930   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE2_90_100

  3.207   5.5925   0.9079    0.860   0.59    0.01  T_HTML_OBFUSCATE3_00_10
  1.517   3.0908   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE3_10_20
  1.789   3.6452   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE3_20_30
  1.982   4.0402   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE3_30_40
  1.605   3.2710   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE3_40_50
  1.238   2.5225   0.0000    1.000   1.00    0.01  T_HTML_OBFUSCATE3_50_60
  0.524   1.0672   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE3_60_70
  0.524   1.0672   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE3_70_80
  0.068   0.1386   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE3_80_90
  0.034   0.0693   0.0000    1.000   0.99    0.01  T_HTML_OBFUSCATE3_90_100

and 00_05 and 05_10 results:

  (this is using the current rule)
  8.202  13.6105   2.9909    0.820   0.51    0.01  T_HTML_OBFUSCATEX00_05
  1.302   2.5017   0.1469    0.945   0.82    0.01  T_HTML_OBFUSCATEX05_10

  7.760  12.7443   2.9575    0.812   0.49    0.01  T_HTML_OBFUSCATE1X00_05
  1.197   2.2800   0.1535    0.937   0.80    0.01  T_HTML_OBFUSCATE1X05_10

  2.268   3.7769   0.8145    0.823   0.50    0.01  T_HTML_OBFUSCATE2X00_05
  0.673   1.3652   0.0067    0.995   0.98    0.01  T_HTML_OBFUSCATE2X05_10

  2.537   4.2412   0.8946    0.826   0.50    0.01  T_HTML_OBFUSCATE3X00_05
  0.670   1.3514   0.0134    0.990   0.96    0.01  T_HTML_OBFUSCATE3X05_10





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to