I also forgot one other critical speed factor: the imagedata uses *four* bytes per pixel in the image, and the first byte is always the same. So checking every byte requires four times as many tests as required, compared to checking four bytes at a time. [Weirdly, my first byte always shows as 255, not 0, as stated in the docs and as working for others.]

I spent a little bit of time putting together a stack with different kinds of images and a script which reflects some of the concepts I discussed. (By no means is it a formal "O(ND)" difference algorithm!) It's actually not truly recursive. On my Core i7 920 system, it takes 10 milliseconds (versus 289) to process a 200x300 image with minimal delta from source; 842 milliseconds (versus 7312) for a 1600x1200 resolution screenshot comparison. The stack (with various image types) can be found here:

http://bill.on-rev.com/linked/Compare.rev

Here is the core of the script:

-- where L is the length of the imagedata,
-- r is initially set to L, n & c are initialized to 0
  repeat while c < L
     repeat while char c+1 to c+r of a <> char c+1 to c+r of b
add 1 to n; if n mod 1000 = 0 then set the thumbPosition of scrollbar "Progress" to c
        if r >= 8 then
           put r div 2 into r
        else
           put 1 into hAll[c div 4 * 4+1] -- report delta
           add 4 to c
        end if
     end repeat
     add r to c
     put L - c into r
  end repeat

I suspect there is a subtle math error in here as it generates a slightly different number of changed pixels compared to the byte-by-byte method, but it does reflect the "divide and conquer" approach, and testing four bytes at the minimum, as opposed to one at a time. Also, I realized that in the edge condition of an image being 100% different from the source, the original method can't be beat. But in the case of screen shots where you might have only a tiny portion of the screen changing, there is much room for improvement over the original approach.

This is a very interesting challenge and I hope others pick it up and further refine the algorithm.

- Bill

p.s.: Richmond: Thanks for your stack/images. Remember, you can just replace spaces with %20 in urls to get them to behave :)
http://mathewson.110mb.com/FILEZ/IMAGE%20COMPARE.rev.zip




"Bill Marriott" <[email protected]> wrote in message news:[email protected]...
Bert,

Others have pointed out the delay introduced by updating a progress bar with every pixel and suggested updating it every 100 or 500 pixels or so. Similarly, comparing byte-by-byte is going to be slow.

An immediate, simple improvement will be achieved by comparing *groups* of pixels between the two images. For example, if your image is 10,000 bytes in size, comparing 500 bytes at a time results in 20 comparisons instead of 10,000 comparisons. As you find differences in a block of 500 bytes, you can then down-shift to find the differences within that 500-byte block with more granularity.

A refinement on this approach is simply to "divide and conquer" by constantly dividing the image by half and recursively testing the halves for differences. If the differences between the two images are small, the comparison can be near-instant.

One of the classic papers on checking for differences between data sets can be found here:

http://xmailserver.org/diff2.pdf

Of course, the language in that paper is way beyond my comprehension ;)

I'll putter around with expressing these concepts elegantly in Rev, but hopefully this gives you or someone else on the list a starting point for an algorithm that is dramatically faster than byte-for-byte testing. (I'd love to see the "O(ND)" difference algorithm properly implemented in Rev code.)

- Bill


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to