Thanks Steve! (Anybody seen my Exedrin?) :-(

Don


> -----Original Message-----
> From: Steve Jolly [mailto:[EMAIL PROTECTED]
> Sent: Saturday, November 13, 2004 4:07 PM
> To: [EMAIL PROTECTED]
> Subject: JPEG Compression Made Slightly-Less-Complicated (was Re:
> Reducing File Size with Photoshop)
> 
> 
> William Robb wrote:
> > This should be entertaining.
> > Please, elucidate.
> 
> OK.  This is going to be bloody impossible to do entirely in text 
> without attachments, but I'll give it a try. :-)
> 
> The first thing that JPEG compression does is divide your image up into 
> 8-pixel by 8-pixel blocks.  These blocks are then compressed separately. 
>   This is the reason that highly-compressed JPEGs look blocky. :-)
> 
> You're going to have to think like a mathematician now.  What you think 
> of as an 8-pixel by 8-pixel crop of your photo, a mathematician thinks 
> of as a "function".  They would write something like B(x,y) - brightness 
> as a function of position in two dimensions.  Your 8x8 block contains 
> the values of B(x,y) at 64 points in the image.
> 
> The second thing that JPEG compression does is, for each block in the 
> image, to take the data and runs it through a "Discrete Cosine 
> Tranformation", or DCT.  This is just a mathematical equation - a "black 
> box" that takes in a function B(x,y) and outputs another function 
> A(u,v).  Now, what are A, u and v, I hear you cry?  That's where it 
> starts getting tricky.  A(u,v) is a new 8x8 block that contains *spatial 
> frequencies* of the brightness information in the original 8x8 block.  I 
> reckon that needs a bit more explanation.
> 
> Forget the two-dimensions aspect for a minute; think about a single 8x1 
> pixel crop of your image.  If I draw a graph that shows their 
> brightnesses, it might look something like this:
> 
> B^
>   |
>   |                |  |
>   | |  |        |  |  |
>   | |  |  |     |  |  |  |
>   +------------------------> x
>     1  2  3  4  5  6  7  8
>          pixel number
> 
> Now, that looks a bit like a sine wave to me.  If I draw a sine wave 
> with a suitable frequency, I could get something that looks like this:
> 
> B^
>   |                    *
>   | *               *     *
>   |   *           *
>   |    *         *
>   |      *     *
>   +---------*--------------> x
>     1  2  3  4  5  6  7  8
>          pixel number
> 
> Now that's not quite the same shape as the original data, but it's a 
> start.  If we picked a second sine wave with a different frequency and 
> added it to the first one, we could get closer.  If we added a third 
> one, we'd be closer still, and so on.  As it happens, there's a quirk of 
> maths that says that if we use eight very specific sine waves that are 
> the same every time (but with different amplitudes and horizontal 
> offsets) and add them, we can get *exactly* the same shape as we started 
> out again.  The formula that says what the amplitudes and offsets should 
> be is the DCT.
> 
> So, still thinking in *one* dimension, a DCT takes B(x) (The brightness 
> of each of the eight positions represented by x) and outputs A(u), where 
> A is the amplitude and horizontal offset of each of the eight sine waves 
> represented by u.  I hope it's not to hard to imagine that a *two* 
> dimensional DCT takes B(x,y) and outputs A(u,v).  Because if it is, I've 
> lost you. :-)
> 
> Right.
> 
> So far, we haven't done any lossy compression.  If we take A(u,v) and 
> add all those sine waves together, we get B(x,y) back *exactly* as it 
> was originally.  The lossy bit happens next, in a process called 
> "quantisation".  You see, the values of A aren't whole numbers (0,1,2,3 
> etc) like the original brightness values were; they're "real" numbers - 
> they can take *any* value between zero and some maximum value that you 
> don't need to know about.  Because they can take an infinite number of 
> values, they'd need an infinite amount of disk space to store them in, 
> which isn't much good for a compression scheme.  So, what we do is we 
> take each of those numbers, and we work out what the closest we can get 
> to it using a *fixed* number of bits is.  And we don't use the same 
> number of bits for each value of u and v, we take advantage of the fact 
> that the human eye is less sensitive to certain spatial frequencies 
> under certain circumstances to "weight" the process - frequencies that 
> the eye sees better get encoded more accurately, and vice versa.  You 
> can vary the overall amount of compression by varying the total number 
> of bits used to compress each block.
> 
> And that's it!  You can take all the bits that represent all the blocks, 
> save them to a file and you have a new representation of the image; one 
> that takes a bit of work to decode again, but that can take up much less 
> storage space than the original while being perceptually identical.
> 
> Well, I hope that made *some* kind of sense... you may have noticed that 
> I only mentioned stuff relevant to greyscale images.  Colour ones are 
> more complicated (but use the same principles).
> 
> S
> 

Reply via email to