JPEG Compression Made Slightly-Less-Complicated (was Re: Reducing File Size with Photoshop)

Steve Jolly Sat, 13 Nov 2004 14:07:58 -0800

William Robb wrote:

This should be entertaining.
Please, elucidate.

OK. This is going to be bloody impossible to do entirely in text without attachments, but I'll give it a try. :-)

The first thing that JPEG compression does is divide your image up into 8-pixel by 8-pixel blocks. These blocks are then compressed separately. This is the reason that highly-compressed JPEGs look blocky. :-)

You're going to have to think like a mathematician now. What you think of as an 8-pixel by 8-pixel crop of your photo, a mathematician thinks of as a "function". They would write something like B(x,y) - brightness as a function of position in two dimensions. Your 8x8 block contains the values of B(x,y) at 64 points in the image.

The second thing that JPEG compression does is, for each block in the image, to take the data and runs it through a "Discrete Cosine Tranformation", or DCT. This is just a mathematical equation - a "black box" that takes in a function B(x,y) and outputs another function A(u,v). Now, what are A, u and v, I hear you cry? That's where it starts getting tricky. A(u,v) is a new 8x8 block that contains *spatial frequencies* of the brightness information in the original 8x8 block. I reckon that needs a bit more explanation.

Forget the two-dimensions aspect for a minute; think about a single 8x1 pixel crop of your image. If I draw a graph that shows their brightnesses, it might look something like this:

B^
 |
 |                |  |
 | |  |        |  |  |
 | |  |  |     |  |  |  |
 +------------------------> x
   1  2  3  4  5  6  7  8
        pixel number

Now, that looks a bit like a sine wave to me. If I draw a sine wave with a suitable frequency, I could get something that looks like this:

B^
 |                    *
 | *               *     *
 |   *           *
 |    *         *
 |      *     *
 +---------*--------------> x
   1  2  3  4  5  6  7  8
        pixel number

Now that's not quite the same shape as the original data, but it's a start. If we picked a second sine wave with a different frequency and added it to the first one, we could get closer. If we added a third one, we'd be closer still, and so on. As it happens, there's a quirk of maths that says that if we use eight very specific sine waves that are the same every time (but with different amplitudes and horizontal offsets) and add them, we can get *exactly* the same shape as we started out again. The formula that says what the amplitudes and offsets should be is the DCT.

So, still thinking in *one* dimension, a DCT takes B(x) (The brightness of each of the eight positions represented by x) and outputs A(u), where A is the amplitude and horizontal offset of each of the eight sine waves represented by u. I hope it's not to hard to imagine that a *two* dimensional DCT takes B(x,y) and outputs A(u,v). Because if it is, I've lost you. :-)

Right.

So far, we haven't done any lossy compression. If we take A(u,v) and add all those sine waves together, we get B(x,y) back *exactly* as it was originally. The lossy bit happens next, in a process called "quantisation". You see, the values of A aren't whole numbers (0,1,2,3 etc) like the original brightness values were; they're "real" numbers - they can take *any* value between zero and some maximum value that you don't need to know about. Because they can take an infinite number of values, they'd need an infinite amount of disk space to store them in, which isn't much good for a compression scheme. So, what we do is we take each of those numbers, and we work out what the closest we can get to it using a *fixed* number of bits is. And we don't use the same number of bits for each value of u and v, we take advantage of the fact that the human eye is less sensitive to certain spatial frequencies under certain circumstances to "weight" the process - frequencies that the eye sees better get encoded more accurately, and vice versa. You can vary the overall amount of compression by varying the total number of bits used to compress each block.

And that's it! You can take all the bits that represent all the blocks, save them to a file and you have a new representation of the image; one that takes a bit of work to decode again, but that can take up much less storage space than the original while being perceptually identical.

Well, I hope that made *some* kind of sense... you may have noticed that I only mentioned stuff relevant to greyscale images. Colour ones are more complicated (but use the same principles).

JPEG Compression Made Slightly-Less-Complicated (was Re: Reducing File Size with Photoshop)

Reply via email to