Re: Inefficient code - Solutions

Bill Marriott Sat, 27 Jun 2009 12:28:32 -0700

I also forgot one other critical speed factor: the imagedata uses *four*bytes per pixel in the image, and the first byte is always the same. Sochecking every byte requires four times as many tests as required, comparedto checking four bytes at a time. [Weirdly, my first byte always shows as255, not 0, as stated in the docs and as working for others.]

I spent a little bit of time putting together a stack with different kindsof images and a script which reflects some of the concepts I discussed. (Byno means is it a formal "O(ND)" difference algorithm!) It's actually nottruly recursive. On my Core i7 920 system, it takes 10 milliseconds (versus289) to process a 200x300 image with minimal delta from source; 842milliseconds (versus 7312) for a 1600x1200 resolution screenshot comparison.The stack (with various image types) can be found here:


http://bill.on-rev.com/linked/Compare.rev

Here is the core of the script:

-- where L is the length of the imagedata,
-- r is initially set to L, n & c are initialized to 0
  repeat while c < L
     repeat while char c+1 to c+r of a <> char c+1 to c+r of b

add 1 to n; if n mod 1000 = 0 then set the thumbPosition ofscrollbar "Progress" to c

        if r >= 8 then
           put r div 2 into r
        else
           put 1 into hAll[c div 4 * 4+1] -- report delta
           add 4 to c
        end if
     end repeat
     add r to c
     put L - c into r
  end repeat

I suspect there is a subtle math error in here as it generates a slightlydifferent number of changed pixels compared to the byte-by-byte method, butit does reflect the "divide and conquer" approach, and testing four bytes atthe minimum, as opposed to one at a time. Also, I realized that in the edgecondition of an image being 100% different from the source, the originalmethod can't be beat. But in the case of screen shots where you might haveonly a tiny portion of the screen changing, there is much room forimprovement over the original approach.

This is a very interesting challenge and I hope others pick it up andfurther refine the algorithm.


- Bill

p.s.: Richmond: Thanks for your stack/images. Remember, you can justreplace spaces with %20 in urls to get them to behave :)

http://mathewson.110mb.com/FILEZ/IMAGE%20COMPARE.rev.zip

"Bill Marriott" <[email protected]> wrote in messagenews:[email protected]...

Bert,
Others have pointed out the delay introduced by updating a progress barwith every pixel and suggested updating it every 100 or 500 pixels or so.Similarly, comparing byte-by-byte is going to be slow.
An immediate, simple improvement will be achieved by comparing *groups* ofpixels between the two images. For example, if your image is 10,000 bytesin size, comparing 500 bytes at a time results in 20 comparisons insteadof 10,000 comparisons. As you find differences in a block of 500 bytes,you can then down-shift to find the differences within that 500-byte blockwith more granularity.
A refinement on this approach is simply to "divide and conquer" byconstantly dividing the image by half and recursively testing the halvesfor differences. If the differences between the two images are small, thecomparison can be near-instant.
One of the classic papers on checking for differences between data setscan be found here:
http://xmailserver.org/diff2.pdf

Of course, the language in that paper is way beyond my comprehension ;)
I'll putter around with expressing these concepts elegantly in Rev, buthopefully this gives you or someone else on the list a starting point foran algorithm that is dramatically faster than byte-for-byte testing. (I'dlove to see the "O(ND)" difference algorithm properly implemented in Revcode.)
- Bill



_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Inefficient code - Solutions

Reply via email to