Re: [UPHPU] Breaking Captchas

Mac Newbold Sat, 07 Mar 2009 15:25:01 -0800

Today at 12:45am, Kirk Ouimet said:

Hi List,


My web host allows me to control how much RAM is available on my hosted
Linux VServer and charges me $1 for every 10 MB allocated. I wrote a script
this week that uses information from the Linux command "top" to scale
resources available based on current demand. Running the script ends up
saving me about $40/month. Everything was going great until they put a
Captcha on the page that my script uses to set my allocated resources.

Here's an example of their Captcha image:

http://www.kirkouimet.com/top/captcha.png

I want to defeat it. Using PHP preferably. Anyone have any tips for me?

I don't want to talk a whole lot about it on a publicly archived list ;),but I've done some Optical Character Recognition (OCR) on images similarto that, using PHP and the GD library.

The images I was doing the OCR for used the same font for all theirimages, so all I had to do was recognize each of the 26 letters, and nowit never misses a beat.


Here's a snippet that might get you started:

  $size_arr = getimagesize($file);
  #print_r($size_arr);

  $sx = $size_arr[0];
  $sy = $size_arr[1];

  $im = ImageCreateFromPNG($file);

  $out="($sx,$sy)\n  ";
  $out1=array();
  $out2="";

  if ($im) {

foreach (range(0,$sx-1) as $x) { $out.=($x/10>=1 ? floor($x/10)%10 : "");}

    $out.="\n  ";
    foreach (range(0,$sx-1) as $x) { $out .= $x%10; }
    $out.="\n";
    foreach (range(0,$sy-1) as $y) {
      #echo "Row Y=$y\n";
      $out .= ($y/10>=1 ? floor($y/10)%10 : " ").$y%10;
      $n=0;
      foreach (range(0,$sx-1) as $x) {
        $color = imagecolorat($im,$x,$y);
        $bg = ($color==16777164);
        $fg = !$bg;
        $char = ($fg ? "*" : " ");
        $out .= $char;
        #echo "X=$x   char=$char   fg=$fg, bg=$bg, color=$color\n";
      }
      #echo "**********\n";
      $out .= "\n";
    }
    echo $out;
  }

Since this one was a consistent font, I just had to recognize the patternof pixels in the first few columns of pixels for each letter, and I couldtell what it was, and would skip over x columns to the start of the nextletter. It didn't need to do anything probabilistically by making guessesbased on percentages or anything, so it was totally deterministic, whichmake it really nice and much easier.

For yours, you'd want to start by taking off the border, then by running anoise reduction algorithm over it to take out the stray dots. Basically,you'll load the pixels into a two dimentional array, and go through it rowby row, column by column. You'll want to take out any pixel that is darkwhen everything around it is light. Like this:


  (x-1,y+1)  (x,y+1)  (x+1,y+1)
  (x-1,y  )  (x,y  )  (x+1,y  )
  (x-1,y-1)  (x,y-1)  (x+1,y-1)

If the pixel at (x,y) is black, and the other 8 are white, set (x,y) towhite.

Give that a try and see if that cleans up the image enough to just findthe letters. A variant on this that is a little more agressive is to resetit to white if it only has 0 black neighbors OR 1 black neighbor. That canerase fine lines though, so you want to be more careful with it.

Anyway, it's a topic I am very interested in, and I think your problem canbe solved. This month is crazy busy at work, but if by April you don'thave it solved but still want to solve it, I'd love to spend a few hoursplaying with it with you. It might be as little as a couple hours to crackthe hardest part of the problem.

One thing you'll need to do though is get a large sample of captchas, soyou make sure you have at least a few incidences of every letter. Thefirst step, after any image preprocessing, is to make sure you canrecognize each letter correctly, then you can tackle the whole imageproblem much more easily.


Thanks,
Mac

--
Mac Newbold                     Code Greene, LLC
CTO/Chief Technical Officer     44 Exchange Place
Office: 801-582-0148            Salt Lake City, UT  84111
Cell:   801-694-6334            www.codegreene.com

_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Re: [UPHPU] Breaking Captchas

Reply via email to