Here you go.  It’s only 125k or so.  I was mainly interested in wavelet compression, so those are the routines that are explicitly bound. 
But it’s a pretty short task to add the lines to get at the wavelet transform itself.  Vollmer’s C code is included in the .pm file.

Enjoy!


package PDL::Wavelet;

=head1 PDL::Wavelet - quick-and-dirty module to introduce wavelet compression to PDL

PDL::Wavelet is an interface between Daniel Vollmer's wavelet
compression library and PDL.  It is currently extremely
quick-and-dirty: it uses Inline::Pdlpp to handle the Perl interaction,
and incldues Vollmer's library in its entirety in the generated C
code.

The library defines two methods of interest: PDL::wvcomp to compress a
single image (a 2-D short PDL) using the wavelet transform, and
PDL::wvexp to expand a compressed image.

At the moment, PDL::Wavelet is not threadable.  It should be: the
library handles up to 16 bitplanes, so it should be possible to
wavelet-compress a 3-D PDL with size up to 16 in dim 2.

The library is a quick hack put together to demonstrate hybrid
Fourier/wavelet compression of spectra for the SPICE proposal.  Future
work should update it.  Perhaps it should be included in the PDL
source tree.

=head1 Author

Wavelet library Copyright (C) 2001-2007 Daniel Vollmer; version 3.4.0
included.  The library has been modified: the header files and source
files have all been glommed together, and now-unnecessary #include
lines have been removed.

Awful glue code Copyright (C) 2008 Craig DeForest; licensed for
distribution under the same terms as the wavelet library itself

=cut

use PDL;

use Inline Pdlpp=><<'EOF';

no PDL::NiceSlice;

pp_addhdr(<<'EOHDR');
/** \file bit.h Bit-wise file or memory access. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _BIT_H_
#define _BIT_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

#include <stdio.h>

/** Handle to a bit-stream file. **/
typedef struct {
	FILE* file; /**< Real file-handle when working with on-disk files **/
	unsigned char* buffer; /**< Buffer (or the input pointer) with data **/
	int idx; /**< Current position in buffer **/
	int bufsize; /**< Size of the buffer **/
	int bufleft; /**< How many valid bytes are currently in the buffer **/
	int bits_left; /**< Number of bits that are still available for us to read or have been written **/
	int initial_bits;
	unsigned char mask; /**< Used as mask for bit_read(), count for bit_write() **/
	unsigned char current_byte; /**< The current byte being read from / written to **/
	char rwmode; /**< Copy of the 1st entry of the mode-string from bit_open() **/
	char own_buffer; /**< Whether we own (as in "need to free") the buffer **/
} t_bit_file;

t_bit_file* bit_open(char* name, const char *mode, int num_bits);
int bit_close(t_bit_file* f, unsigned char** mem);
void bit_free(unsigned char* mem);
unsigned int bit_read(int num, t_bit_file* f);
unsigned int bit_read_single(t_bit_file* f);
int bit_write(const unsigned int bits, int num, t_bit_file* f);

#ifdef __cplusplus // C++
}
#endif

#endif /* _BIT_H_ */

/** \file config.h Compile time configuration options. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _CONFIG_H_
#define _CONFIG_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

/** This is the basic (signed) "pixel" type.
 * Minimum size for 8 bit input channels is ~12 bits (so usually at least 16
 * are used).
 * \Warning Can have at most 64bits (due to wv_PEL_STORE_BITS)!
**/
typedef short int wv_pel;

/** Signed integer type used to represent error improvements.
 * Should be at least 32 bits, and larger than that when compressing more than
 * 8bpp data; for decompression 32 bits should always be enough.
 */
typedef int wv_fixed_point;

/** Indirectly defines the # of channels that can be stored in a file.
 * This limit is fairly arbitrary, although some arrays depending on this size
 * are created on the stack.
 * \Warning Changing it will make the bit-stream format incompatible.
**/
#define wv_MAX_CHANNEL_STORE 4
/** Maximum amount of channels that can be stored in a compressed file. **/
#define wv_MAX_CHANNELS (1 << wv_MAX_CHANNEL_STORE)

/** Define to show assertions (useful during development) **/
/*#define wv_ENABLE_ASSERTS*/


#ifdef wv_ENABLE_ASSERTS
#include <assert.h> /* so we don't have to include at the top of each file */
#define wv_ASSERT(c,r) assert((c))
#else
#define wv_ASSERT(c,r) ((void)0)
#endif

#ifdef __cplusplus // C++
}
#endif

#endif /* _CONFIG_H_ */

/** \file reorder.h Data-reordering and table initialization. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _REORDER_H_
#define _REORDER_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

/** Maximum amount of reorder-levels we can possibly need (i.e. for a 32768x32768 image). **/
#define wv_NUM_REORDER_LEVELS 13

/** Reorder table which stories information needed to shuffle the data from each
 * level of the wavelet decomposition into 1D array.
**/
typedef struct {
	int num_levels; /**< Number of individual reorder tables **/
	unsigned int *level[wv_NUM_REORDER_LEVELS];
} t_reorder_table;

/** Reorder function (determines the direction). **/
typedef unsigned int (*t_reorder_function) (wv_pel *dst, wv_pel* src, const unsigned int *idx, const unsigned int num, const int width2, const unsigned int coefficient_pitch, const unsigned int xlimit, const unsigned int ylimit);

t_reorder_table* wv_create_reorder_table(const int Width, const int Height);
void wv_free_reorder_table(t_reorder_table *Table);
void get_block_dimensions(int width2, int width, int height2, int height, int shift, int *block_width, int *block_height);
void reorder(wv_pel *dst, wv_pel* src, const t_reorder_table* table, const int width, const int height, const int unreorder);

#ifdef __cplusplus // C++
}
#endif

#endif /* _REORDER_H_ */

/** \file channel.h Single-channel processing (prepare for en-/decoding). **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _CHANNEL_H_
#define _CHANNEL_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

/** Maximum amount of blocks we can have (which is for a 32768x32768 image). **/
#define wv_MAX_BLOCKS (1 + 12*3)
/** Amount of bits needed to store wv_MAX_BLOCKS (as we know it is 1 + multiple of 3) **/
#define wv_MAX_BLOCK_STORE 4

/** Information about a single (or multiple agglomerated) bit-planes, needed for scheduling. **/
typedef struct {
	double improvement; /**< Improvement in mean-square error for this bitplane **/
	double improvement_per_bit; /**< Improvement per bit **/
	int num_bits; /**< # Bits needed for this bitplane (including block index) **/
	unsigned char count; /**< Agglomeration count of how many bitplanes we've combined **/
} t_wv_bitplane;

/** Information about a block of wavelet coefficients.
 * In this context, a \em block refers to the detail coefficients of a
 * particular wavelet decomposition level. For example, a 512x512 image is first
 * decomposed into a 256x256 average (this is then again decomposed recursively)
 * and 3 256x256 detail coefficients. These 3 256x256 detail coefficients make
 * up a so-called \em block.
 */
typedef struct {
	t_wv_bitplane* plane; /**< (Temporary) Array of bitplanes (0 ^= highest bitplane!) **/
	int	offset; /**< Where our coefficients start in t_wv_channel.reordered_channel **/
	int	size; /**< Number of relevant wavelet coefficients in this block **/
	unsigned char num_planes; /**< Number of bit-planes needed to encode the block **/
} t_wv_block;

/** Entry in the scheduler about the order in which we write blocks. **/
typedef struct {
	wv_fixed_point improvement; /**< Improvement in mean-square error for this entry **/
	int num_bits; /**< # of bits needed for this block (including block index) **/
	unsigned char count; /**< Agglomeration count of how many bitplanes we've combined **/
	unsigned char b; /**< Index of the block to write next **/
	unsigned char p; /**< Bit-plane to be written next (0 ^= highest) **/
} t_wv_schedule_entry;

/** Channel of wavelet coefficients to be compressed.
 * This data structure stores its own private (transformed & reordered)
 * version of the image data.
**/
typedef struct {
	int width; /**< Width of the input-image (not necessarily pow2) **/
	int height; /**< Height of the input-image (not necessarily pow2) **/
	int num_schedule_entries; /**< # of entries in the schedule **/
	t_wv_schedule_entry* schedule; /**< The actual schedule **/
	wv_pel* reordered_channel; /**< Reordered wavelet coefficients **/
	t_wv_block* block; /**< Block-Array **/
	unsigned char num_blocks; /**< # of coefficient blocks **/
	unsigned char schedule_fractional_bits; /**< Number of fractional bits in the improvement values **/
	unsigned char data_format; /**< User-data stored in the compressed file **/
	unsigned char min_pel_store; /**< Indicates minimum # of bytes needed in wv_pel (1 << min_pel_stor) **/
} t_wv_channel;

/** Progress callback for wv_channel_compact(). **/
typedef void (*wv_progress_function) (int Current, int End, void* UserData);


t_wv_channel* wv_channel_compact(const int Width, const int Height, const unsigned char DataFormat, wv_pel* Data,
	const int Approximate, const t_reorder_table* ReorderTable, wv_progress_function Progress, void* UserData);
void wv_channel_uncompact(const t_wv_channel *CC, const t_reorder_table* ReorderTable, wv_pel *Data);
void wv_channel_free(t_wv_channel* CC);

unsigned char wv_channel_get_data_format(const t_wv_channel *CC);

int wv_channel_get_width(const t_wv_channel *CC);
int wv_channel_get_height(const t_wv_channel *CC);

#ifdef __cplusplus // C++
}
#endif

#endif /* _CHANNEL_H_ */


/** \file io.h Input/output of compressed wavelet coefficients. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _IO_H_
#define _IO_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

/** Version number as string. **/
#define wv_VERSION "3.4.0"
/** Version number as integer. **/
#define wv_VERNUM 0x0340

typedef struct {
	int num_channels; /**< # of channels stored in the file **/
	int width; /**< Width of all the channels stored **/
	int height; /**< Height of all the channels stored **/
	unsigned char num_blocks; /**< # of coefficient blocks in the file **/
	unsigned char min_pel_store; /**< # of bytes needed in wv_pel (1 << n) **/
} t_wv_header;

int wv_query_header(const int NumChannels, t_wv_channel* Channel[]);
int wv_query_scheduler(const int NumChannels, t_wv_channel* Channel[], const double TargetMSE[], const int MaxBits, const int MinBits[], double EstimatedMSE[], int EstimatedBits[]);
int wv_encode(const int NumChannels, t_wv_channel* Channel[], const double TargetMSE[], const int MaxBits, const int MinBits[], t_bit_file* BF);
t_wv_header* wv_read_header(t_wv_header* Header, t_bit_file* BF);
int wv_decode(const t_wv_header* Header, int Reduction, t_wv_channel* Channel[wv_MAX_CHANNELS + 1], t_bit_file* BF);

#ifdef __cplusplus // C++
}
#endif

#endif /* _IO_H_ */

/** \file reorder.h Data-reordering and table initialization. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _REORDER_H_
#define _REORDER_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

/** Maximum amount of reorder-levels we can possibly need (i.e. for a 32768x32768 image). **/
#define wv_NUM_REORDER_LEVELS 13

/** Reorder table which stories information needed to shuffle the data from each
 * level of the wavelet decomposition into 1D array.
**/
typedef struct {
	int num_levels; /**< Number of individual reorder tables **/
	unsigned int *level[wv_NUM_REORDER_LEVELS];
} t_reorder_table;

/** Reorder function (determines the direction). **/
typedef unsigned int (*t_reorder_function) (wv_pel *dst, wv_pel* src, const unsigned int *idx, const unsigned int num, const int width2, const unsigned int coefficient_pitch, const unsigned int xlimit, const unsigned int ylimit);

t_reorder_table* wv_create_reorder_table(const int Width, const int Height);
void wv_free_reorder_table(t_reorder_table *Table);
void get_block_dimensions(int width2, int width, int height2, int height, int shift, int *block_width, int *block_height);
void reorder(wv_pel *dst, wv_pel* src, const t_reorder_table* table, const int width, const int height, const int unreorder);

#ifdef __cplusplus // C++
}
#endif

#endif /* _REORDER_H_ */

/** \file utility.h Common utility functions. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _UTILITY_H_
#define _UTILITY_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

#ifndef MIN
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
#define MAX(a, b) (((a) > (b)) ? (a) : (b))
#endif

int wv_log2i(int max_val);
double wv_mse_to_psnr(const double mse);
double wv_psnr_to_mse(const double psnr);
double wv_calc_psnr(const wv_pel *a, const wv_pel *b, const int width, const int height, const int pitch, double *pmse);

int wv_rgb_to_ycocg(const int Num, wv_pel* R, wv_pel* G, wv_pel* B);
int wv_ycocg_to_rgb(const int Num, wv_pel* Y, wv_pel* Cb, wv_pel* Cr);

int wv_rgb_to_ycocgr(const int Num, wv_pel* R, wv_pel* G, wv_pel* B);
int wv_ycocgr_to_rgb(const int Num, wv_pel* Y, wv_pel* Cb, wv_pel* Cr);

#ifdef __cplusplus // C++
}
#endif

#endif /* _UTILITY_H_ */
/* bit.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#include <stdio.h>
#include <stdlib.h>


/** The default size of the per-file bit-buffer (in bytes). An array
 * of this size is used as an in-memory buffer for bit-files that are read /
 * written from / to disk.
**/
#define BIT_BUFFER 1024


/** Opens a file as a bit-stream.
 * Reminiscient of \c fopen(), this function allows for reading and writing to
 * files on a bit-level basis from either disk or memory.
 * @param[in] name filename of the file to be opened. Alternatively, for memory-
 *   based read-access (i.e. <tt>mode == "rm"</tt>) pointer to the region to be
 *   read. When writing to memory and num_bits > 0, used as user-provided
 *   target buffer..
 * @param[in] mode string that specifies the operation. mode[0] = 'r' or 'w'
 *   (read or write), mode[1] = 'b' or 'm' (file or memory).
 * @param[in] num_bits is the number of bits available (or to be used from a
 *   file) when reading, or the maximum number of bits to write. Reads in
 *   excess of num_bits produces 0s, further writes are simply ignored.
 * @return the bit-file handle, or \c NULL on failure.
**/
t_bit_file* bit_open(char* name, const char *mode, int num_bits)
{
	t_bit_file* f = NULL;
	int num_bytes;

	num_bits = (num_bits <= 0) ? 0 : num_bits;
	num_bytes = (num_bits + 7) / 8;

	if (mode[1] == 'm')
	{ /* from / to mem */
		if ((mode[0] == 'r' && name && num_bits > 0) || mode[0] == 'w')
		{
			f = malloc(sizeof *f);
			f->file = NULL;
			f->own_buffer = (name == NULL) || (num_bits <= 0);

			if (num_bits > 0)
			{
				f->bufsize = f->bufleft = num_bytes;
				f->bits_left = num_bits;
			}
			else
			{
				f->bufsize = f->bufleft = BIT_BUFFER;
				f->bits_left = 1 << 30;
			}
			f->buffer = f->own_buffer ? malloc(f->bufsize * sizeof *f->buffer) : (unsigned char *)name;
		}
	}
	else
	{ /* file-based */
		FILE* file = fopen(name, mode);

		if (file)
		{
			f = malloc(sizeof *f);

			f->file = file;
			f->bits_left = (num_bits > 0) ? num_bits : (1 << 30);
			f->bufsize = f->bufleft = BIT_BUFFER;
			f->buffer = malloc(f->bufsize * sizeof *f->buffer);
			if (mode[0] == 'r')
			{
				f->bufleft = fread(f->buffer, sizeof *f->buffer, f->bufsize, f->file);
				if (f->bufleft < f->bufsize)
					f->bits_left = (f->bits_left < f->bufleft * 8) ? f->bits_left : f->bufleft * 8;
			}
		}
	}
	if (f)
	{
		f->idx = 0;
		f->initial_bits = f->bits_left;
		f->rwmode = mode[0];
		if (mode[0] == 'w')
		{
			f->mask = 8;
			f->current_byte = 0;
		}
		else
		{
			f->mask = 0x80;
			f->current_byte = f->buffer[f->idx];
		}
	}
	return f;
}


/** Flushes the contents of the memory-buffer to disk.
 * A partly filled byte is filled with 0-bits before the file-write occurs.
 * @param[in] f bit-stream file whose buffer is to be flushed.
**/
static void bit_flush(t_bit_file* f)
{
	if (f)
	{
		int num = f->idx;

		if (f->mask != 8 && f->mask != 0)
		{
			while (f->mask > 0)
			{
				f->current_byte <<= 1;
				f->mask--;
			}
			f->buffer[f->idx] = f->current_byte;
			num++;
		}
		if (f->file && num > 0)
		{
			fwrite(f->buffer, sizeof *f->buffer, num, f->file);
			f->idx = 0;
			f->mask = 8;
			f->bufleft = f->bufsize;
			f->current_byte = 0;
		}

	}
}


/** Close a bit-stream file opened with bit_open().
 * This function flushes all writes to disk / memory (if applicable) and then
 * deallocates the handle.
 * @param[in] f bit-stream file that is to be closed.
 * @param[out] mem when not \c NULL, it receives the pointer to the buffer
 *   (which is allocated automatically when writing to memory and \c NULL
 *   passed in originally).
 * @return number of bits read or written
**/
int bit_close(t_bit_file* f, unsigned char** mem)
{
	int ret = 0;

	if (f)
	{
		if (f->rwmode == 'w')
			bit_flush(f);

		if (f->file)
		{
			free(f->buffer);
			fclose(f->file);
		}
		else
		{
			if (mem)
				*mem = f->buffer;
			else if (f->rwmode != 'r' && f->own_buffer)
				free(f->buffer);
		}
		ret = f->initial_bits - f->bits_left;
		free(f);
	}
	return ret;
}


/** Frees an internally allocated memory buffer returned by bit_close().
 * @param[in] mem Pointer returned by bit_close() that was not originally
 *   provided by the user.
**/
void bit_free(unsigned char* mem)
{
	if (mem)
		free(mem);
}


/** Reads the given number of bits from a bit-stream file.
 * The file has to be openened in read-mode for this to work.
 * @param[in] num is the number of bits to be read. This value can be larger
 *   than 32, but then earlier bits are lost as they are shifted out of the
 *   variable.
 * @param[in] f bit-stream file from which to read (read-mode).
 * @return the bits as an unsigned integer (bits read first are in higher
 *   bit-positions).
**/
unsigned int bit_read(int num, t_bit_file* f)
{
	unsigned int out = 0;

	while (num > 0)
	{
		out <<= 1;
		num--;
		if (f->bits_left > 0 && f->idx < f->bufleft)
		{
			f->bits_left--;
			out |= (f->current_byte & f->mask) != 0x00;
			f->mask >>= 1;
			if (f->mask == 0x00)
			{ // read a byte
				f->mask = 0x80;
				if (++f->idx >= f->bufleft)
				{
					f->idx = 0;
					f->bufleft = 0;
					if (f->file)
					{
						f->bufleft = fread(f->buffer, sizeof *f->buffer, f->bufsize, f->file);
						if (f->bufleft < f->bufsize)
							f->bits_left = (f->bits_left < f->bufleft * 8) ? f->bits_left : f->bufleft * 8;
					}
				}
				f->current_byte = f->buffer[f->idx];
			}
		}
	}
	return out;
}


/** Reads a single bit from a bit-stream file.
 * The file has to be openened in read-mode for this to work. This is slightly
 * faster than using bit_read() for a single bit.
 * @param[in] f bit-stream file from which to read (read-mode).
 * @return the next bit from f as an unsigned integer.
**/
unsigned int bit_read_single(t_bit_file* f)
{
	unsigned int out = 0;

	if (f->bits_left > 0 && f->idx < f->bufleft)
	{
		f->bits_left--;
		out = (f->current_byte & f->mask) != 0x00;
		f->mask >>= 1;
		if (f->mask == 0x00)
		{ // read a byte
			f->mask = 0x80;
			if (++f->idx >= f->bufleft)
			{
				f->idx = 0;
				f->bufleft = 0;
				if (f->file)
				{
					f->bufleft = fread(f->buffer, sizeof *f->buffer, f->bufsize, f->file);
					if (f->bufleft < f->bufsize)
						f->bits_left = (f->bits_left < f->bufleft * 8) ? f->bits_left : f->bufleft * 8;
				}
			}
			f->current_byte = f->buffer[f->idx];
		}
	}
	return out;
}


/** Writes the given number of bits to a bit-stream file.
 * The file has to be openened in write-mode for this to work.
 * @param[in] bits are the bits to write. More significant bits are written
 *   before less significant ones.
 * @param[in] num is the number of bits to write. This value can be larger than
 *   32, but the respective bits are obviously written as 0-bits.
 * @param[in] f bit-stream file to which to write (write-mode).
 * @return the number of bits actually written.
**/
int bit_write(const unsigned int bits, int num, t_bit_file* f)
{
	int initial_left = f->bits_left;

	while (f->bits_left > 0 && num > 0)
	{
		unsigned int in_mask = 1 << (num - 1);

		f->current_byte = (f->current_byte << 1) | ((bits & in_mask) != 0);
		f->bits_left--;
		num--;
		if (--f->mask == 0x00)
		{ // filled a byte
			f->buffer[f->idx] = f->current_byte;
			f->current_byte = 0;
			f->mask = 8;
			if (++f->idx >= f->bufsize - ((num + 7) >> 3))
			{ // so that we have enough space for num bits (will be zero bits if > 32)
				if (f->file)
					bit_flush(f);
				else if (f->own_buffer)
				{
					f->bufsize *= 2;
					f->buffer = realloc(f->buffer, f->bufsize);
				}
			}
		}
	}
	return initial_left - f->bits_left;
}

/** \internal \file coding.h Rice run-length (de-)coding of integers. **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _CODING_H_
#define _CODING_H_

#ifdef __cplusplus // C++
extern "C" {
#endif

int rice_encode_size(const wv_pel* src, const int num, int k, int max_bp, int* bit_stat);
void rice_encode_plane(const wv_pel* src, const int num, int initial_k, const int cur_bp, unsigned int *refinement_cache, t_bit_file* bf);
int rice_decode_plane(wv_pel* dst, const int num, int k, int cur_bp, int rlr_bits_left, t_bit_file* bf);
void rice_decode_finalize(wv_pel* dst, const int num, const int last_bp);

#ifdef __cplusplus // C++
}
#endif

#endif /* _CODING_H_ */
/** \internal \file transform.h Wavelet transform based on CDF(2,2). **/
/* version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#ifndef _TRANSFORM_H_INCLUDED
#define _TRANSFORM_H_INCLUDED

#ifdef __cplusplus // C++
extern "C" {
#endif

void wavelet_transform(wv_pel *dst, const int width, const int height, const int levels);
void inverse_wavelet_transform(wv_pel *dst, const int width, const int height, const int levels);
void estimate_error(const wv_pel *wavelet, const int width, const int owidth, const int height, const int oheight, int level, const int quant, double err[3]);

#ifdef __cplusplus // C++
}
#endif

#endif /* _TRANSFORM_H_ */

/* coding.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/


/** \page coding Coding Process

\section codewhat What is coded?
We have a one-dimensional array of data (usually consisting of quantized wavelet
coefficients) that we want to store efficiently, but also in a progressive
manner. This is achieved by coding the bit-planes from the top down, e.g. first
we code bit 8 of all coefficients, then bit 7 of all coefficients, and so on
until we have coded all the information.

\section codehow How is it coded?
Due to the wavelet transform  we hope to have reduced most coefficients to small
values (or 0 in the ideal case), thus we only run-length code runs of
consecutive 0 bits. Each 0 bit in the coded stream represents 2^\em k zeros,
whereas a 1 in the coded stream is always followed by \em k bits (denoting some
value \em n) and a sign bit. This is interpreted as \em n 0 bits followed by a
single 1 bit (and the value to which the 1 bit belongs as having that particular
sign).

Now, two additional things are done:
- The value of \em k does not have to be fixed. It can also be adaptive, as long
  as the encoder and decoder always make the same choice of \em k. We increase
  \em k after emitting a 0 bit and decrease it after emitting a 1 bit.
- Secondly, we make a distinction between significant bits and so-called
  refinement bits. Significant bits are those whose most significant bit has
  \em not already been sent / coded. Consequently, refinement bits are the ones
  for which more significant 1 bits have already been coded. The significant
  bits have a high probability of being 0, the refinement bits in contrast only
  have a probability close to 0.5 of being 0. Thus, we ignore the refinement
  bits when run-length coding zero-runs of the significant bits (as the decoder
  is aware for which coefficients this is the case) and embed the "ignored"
  refinement bits into the bit-stream without any coding after writing the next
  1 bit.


\section coderef References
- http://en.wikipedia.org/wiki/Rice_coding
- Henrique Malvar, "Progressive Wavelet Coding of Images", 1999,
  IEEE Data Compression Conference March 1999
**/

#include <stdlib.h>
#include <string.h>


/** The smallest allowed value for the adaptive parameter k. During the
 * run-length coding process, any run of 0s smaller than 2^RICE_MIN_K will have
 * to be escaped.
**/
#define RICE_MIN_K 0
/** The largest allowed value for the adaptive parameter k. This means the
 * longest run of 0s we can represent is 2^RICE_MAX_K.
 * \Warning This cannot be larger than 30 as we write up to 2 + k bits in a
 * single bit_write(), and the argument to that may not exceed 32 bits.
**/
#define RICE_MAX_K 30


/** \internal Encode a bitplane of a block of coefficients using adaptive Rice
 * coding. Rice coding is a variant of Golomb coding. For more information see
 * \ref coding.
 * @param[in] src coefficients (or simply integers) to be coded.
 * @param[in] num number of coefficients.
 * @param[in] k initial value of \em k for the adaptive Rice coding; has to be
 *   consistent between encoding and decoding!
 * @param[in] cur_bp Which bitplane to code (0 ^= least significant ), which
 * assumes that all the preceding (i.e. higher) planes have already been coded.
 * @param[in] refinement_cache A (temporary) store refinement bits which we
 *   flush (uncoded) when a 0 run finishes; has to provide (num + 31) / 32 ints.
 * @param[in] bf file to write coded data to.
**/
void rice_encode_plane(const wv_pel* src, const int num, int k, const int cur_bp, unsigned int *refinement_cache, t_bit_file* bf)
{
	int num_zeros = 0, num_refinement = 0;
	int idx, i;
	wv_pel mask = (wv_pel)1 << cur_bp; /* extract the bit for the current bitplane */
	wv_pel non_ref = ((wv_pel)2 << cur_bp) - 1; /* largest possible non-refinement value */

	k = MAX(RICE_MIN_K, MIN(RICE_MAX_K, k)); /* make sure k is in valid range */

	for (idx = 0; idx < num; idx++)
	{ /* each coefficient */
		wv_pel cur = abs(src[idx]);

		if (cur <= non_ref)
		{ /* not refinement */
			if ((cur & mask) == 0)
				num_zeros++;
			else
			{ /* a (non-refinement) 1 bit */
				i = 0; /* amount of 0 bits we are about to write */
				while (num_zeros >= (1 << k))
				{ /* count the # of accumulated zeros bits */
					i++;
					num_zeros -= (1 << k);
					k = MIN(RICE_MAX_K, k + 1);
				}
				bit_write(0, i, bf);

				/* write 1 + left over 0s + sign */
				bit_write((1 << (k + 1)) | (num_zeros << 1) | (src[idx] < 0), 1 + k + 1, bf);
				num_zeros = 0;
				k = MAX(RICE_MIN_K, k - 1);

				if (num_refinement > 0)
				{ /* and flush the accumulated refinement bits */
					for (i = 0; i < num_refinement >> 5; i++)
						bit_write(refinement_cache[i], 32, bf);
					bit_write(refinement_cache[num_refinement >> 5], num_refinement & 31, bf);
					num_refinement = 0;
				}
			}
		}
		else
		{ /* refinement bit (i.e. an entry whose MSB has already been sent) */
			refinement_cache[num_refinement >> 5] = (refinement_cache[num_refinement >> 5] << 1) | ((cur & mask) != 0);
			num_refinement++;
		}
	} /* each coefficient */

	i = 0; /* amount of 0 bits we are about to write */
	while (num_zeros > 0)
	{ /* we may indicate too many zeros, but in the decoder we know when a bitplane is done */
		i++;
		num_zeros -= (1 << k);
		k = MIN(RICE_MAX_K, k + 1);
	}
	bit_write(0, i, bf); /* and finally write the 0s */

	for (i = 0; i < num_refinement >> 5; i++)
		bit_write(refinement_cache[i], 32, bf);
	bit_write(refinement_cache[num_refinement >> 5], num_refinement & 31, bf);
}


/** \internal Count how many bits would have been used to encode a block of
 * coefficients using adaptive Rice coding. This routine is a modified version
 * of rice_encode_plane() that does not do any actual writing, but returns the
 * number of bits that would have been used to encode each bitplane, which is
 * useful in creating the schedule and faster than writing to a temporary file.
 * @param[in] src coefficients (or simply integers) to be coded.
 * @param[in] num number of coefficients.
 * @param[in] initial_k initial value of \em k for the adaptive Rice coding.
 *   Has to be consistent between encoding and decoding!
 * @param[in] num_bp number of bitplanes in src.
 * @param[out] bit_stat receives the number of bits needed to encode the data
 *   thus far for each bitplane. Has to have space for at least max_bp
 *   entries.
 * @return total number of bits "written".
**/
int rice_encode_size(const wv_pel* src, const int num, int initial_k, int num_bp, int* bit_stat)
{
	int bits_written = 0;

	if (num == 0)
		return 0;

	if (num_bp > 0)
	{
		int cur_bp;

		initial_k = MAX(RICE_MIN_K, MIN(RICE_MAX_K, initial_k)); /* make sure k is in valid range */
		for (cur_bp = num_bp - 1; cur_bp >= 0; cur_bp--)
		{ /* each coded bitplane */
			int num_zeros = 0, num_refinement = 0;
			int k = initial_k;
			int idx;
			wv_pel mask = (wv_pel)1 << cur_bp; /* extract the bit for the current bitplane */
			wv_pel ref = ((wv_pel)2 << cur_bp) - 1; /* largest possible non-refinement value */

			for (idx = 0; idx < num; idx++)
			{ /* each coefficient */
				wv_pel cur = abs(src[idx]);

				if (cur <= ref)
				{ /* not refinement */
					if ((cur & mask) == 0)
						num_zeros++;
					else
					{ /* a (non-refinement) 1 bit */
						while (num_zeros >= (1 << k))
						{ /* count the # of accumulated zeros bits */
							bits_written++;
							num_zeros -= (1 << k);
							k = MIN(RICE_MAX_K, k + 1);
						}
						bits_written += 1 + k + 1; /* '1' + # of '0's + sign */
						num_zeros = 0;
						k = MAX(RICE_MIN_K, k - 1);

						bits_written += num_refinement; /* flush the accumulated refinement bits */
						num_refinement = 0;
					}
				}
				else
				{ /* refinement bit (i.e. an entry whose MSB has already been sent) */
					num_refinement++;
				}
			} /* each coefficient */

			while (num_zeros > 0)
			{ /* we may indicate too many zeros, but in the decoder we know when a bitplane is done */
				bits_written++;
				num_zeros -= (1 << k);
				k = MIN(RICE_MAX_K, k + 1);
			}

			bits_written += num_refinement;
			if (bit_stat)
				bit_stat[num_bp - 1 - cur_bp] = bits_written;
		} /* each coded bitplane */
	}
	else
	{
		if (bit_stat)
			bit_stat[0] = bits_written;
	}

	return bits_written;
}


/** \internal Decodes a bitplane of a block of coefficients using adaptive Rice
 * coding. Rice coding is a variant of Golomb coding. For more information see
 * \ref coding.
 * @param[in,out] dst Receives another bitplane of decoded coefficients. Needs
 *   space for at least num entries. Has to contain previous bit-planes.
 * @param[in] num number of coefficients.
 * @param[in] k initial value of \em k for the adaptive Rice coding. Has to be
 *   consistent between encoding and decoding!
 * @param[in] cur_bp Which bitplane we are currently decoding (only needed for
 *   figuring out where to store the sign bit).
 * @param[in] rlr_bits_left Number of run-length coded bits left from the
 *   higher bitplanes (aka non-refinement bits). Initially (for the top-most
 *   plane) this is num.
 * @param[in] bf file to read coded data from.
 * @return Number of run-length coded bits left from the higher bitplanes (aka
 *   non-refinement bits).
**/
int rice_decode_plane(wv_pel* dst, const int num, int k, int cur_bp, int rlr_bits_left, t_bit_file* bf)
{
	if (num > 0 && dst && bf)
	{
		wv_pel sign_flag = 0;
		int num_zeros = 0;
		int	idx = 0;
		int cur_rlr_bits_left = rlr_bits_left; /* how many coded bits are left in this bitplane */

		k = MAX(RICE_MIN_K, MIN(RICE_MAX_K, k)); /* make sure k is in valid range */

		/* arranged so that after all the bitplanes are read,
		  this flag ends up in the sign-bit position (MSB) */
		sign_flag = (wv_pel)1 << ((8 * sizeof (wv_pel)) - (1 + cur_bp));
		while (cur_rlr_bits_left > 0)
		{
			if (bit_read_single(bf) == 0)
			{ /* more zeros */
				num_zeros += 1 << k;
				cur_rlr_bits_left -= 1 << k;
				k = MIN(RICE_MAX_K, k + 1);
			}
			else
			{ /* a one, which means work */
				int added_zeros;
				unsigned char encoded_sign;

				/* read sign bit as well and increase # of 0s by 1 (2 as it's shifted >> 1),
				  the extra zero will become the new '1' bit */
				added_zeros = bit_read(k + 1, bf) + 2;
				k = MAX(RICE_MIN_K, k - 1);
				encoded_sign = added_zeros & 1; /* remember the sign */
				added_zeros = MIN(added_zeros >> 1, cur_rlr_bits_left); /* safety */
				cur_rlr_bits_left -= added_zeros;
				rlr_bits_left--; /* one less "significant" bit */
				num_zeros += added_zeros;

				for (; num_zeros > 0; idx++)
				{ /* store the accumulated 0 bits and at the same time read
				  in the refinement bits that occured during this run of 0s */
					if (dst[idx] != 0)
						dst[idx] = (dst[idx] << 1) | bit_read_single(bf); /* refinement */
					else
						num_zeros--; /* 0 << 1 | 0 is still 0 :D */
				}
				/* as we were looking for an extra zero which ended the loop */
				dst[idx - 1] = (sign_flag * encoded_sign) | 1;
			}
		}
		/* and read the last refinement bits (no need to write zeros ;)) */
		for (; idx < num; idx++)
			if (dst[idx] != 0)
				dst[idx] = (dst[idx] << 1) | bit_read_single(bf);

		return rlr_bits_left;
	}
	return 0;
}


/** \internal Finalizes a block with decoded bit-planes, this includes
 *   dequantization and properly setting the sign bit.
 * @param[in,out] dst Decoded bitplanes.
 * @param[in] num number of coefficients.
 * @param[in] last_bp The last bit-plane that was decoded, which lets us know
 *   by how much to dequantize the data and where the sign-bit ended up.
**/
void rice_decode_finalize(wv_pel* dst, const int num, const int last_bp)
{
	if (num > 0 && dst)
	{
		wv_pel sign_flag;
		int i;

		/* arranged so that after all the bitplanes are read,
		  this flag ends up in the sign-bit position (MSB) */
		sign_flag = (wv_pel)1 << ((8 * sizeof (wv_pel)) - (1 + last_bp));
		sign_flag--; /* set all the bits up to the sign bit */
		if (last_bp > 0)
		{ /* have to dequantize as well as set the sign properly */
			for (i = 0; i < num; i++)
			{
				wv_pel val = dst[i] & sign_flag; /* remove sign bit */

				val = (val > 0) ? (((val << 1) + 1) << last_bp) >> 1 : 0; /* dequant */
				if (dst[i] > sign_flag)
					dst[i] = -val; /* properly set sign (i.e. 2s complement) */
				else
					dst[i] = val;
			}
		}
		else
		{ /* only need to "properly" set the sign (we have only set the actual
		  sign bit, the other bits still represent a +ve number) */
			for (i = 0; i < num; i++)
				if (dst[i] < 0)
					dst[i] = -(dst[i] & sign_flag); /* properly set sign */
		}
	}
}


/* transform.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

/** \page transform Wavelet Transform
We start by computing a complete wavelet-transform using the Cohen-Daubechies-
Fevereau (2,2) biorthogonal wavelet (which incidentally seems to be the same as
used for lossless JPEG2000, but they call it a (5,3)?). For this, the image
needs dimensions which are a power of 2. To achieve this, I simply repeated the
last column and row of the image as necessary.

Doing the wavelet transform once, splits the image into average components as
well as horizontal (HD), vertical (VD) and diagonal detail (DD) coefficients.
Then we apply the transform to the resulting averages again and again, until the
smallest average block has either a width or height that is equal to 2.

The coefficients resulting from one round of the transform are in the same
so-called octave band (and are later on combined together via the \ref reorder
"reordering-process").

\section transformref References
- Peter Schröder, Wim Sweldens, "Course Notes: Wavelets in Computer Science",
  1996,  SIGGRAPH 96
- Wim Sweldens, "The Lifting Scheme: A New Philosophy in Biorthogoal Wavelet
  Construction", 1995, Wavelet Applications in Signal and Image Processing
- S. G. Mallard, "A Theory for Multiresolution Signal Decomposition: The Wavelet
  Representation", 1989, IEEE Transactions on Pattern Analysis and Machine
  Intelligence
**/


/** Detail coefficient for the CDF(2,2)-transform (lifting) **/
#define DETAIL(ptr, pitch) ((ptr)[0] -= ((ptr)[-(pitch)] + (ptr)[(pitch)]) / 2)
/** Last detail coefficient for the CDF(2,2)-transform (lifting) **/
#define DETAIL_LAST(ptr, pitch) ((ptr)[0] -= (ptr)[-(pitch)])
/** Average coefficient for the CDF(2,2)-transform (lifting). \Warning We store this at -pitch! **/
#define AVERAGE(ptr, pitch) ((ptr)[-(pitch)] += ((ptr)[0] + (ptr)[-2 * (pitch)]) / 4)
/** First average coefficient for the CDF(2,2)-transform (lifting). \Warning We store this at -pitch! **/
#define AVERAGE_FIRST(ptr, pitch) ((ptr)[-(pitch)] += (ptr)[0] / 2)

static void wavelet_row(wv_pel *dst, const int pitch, const int count)
{
	wv_pel *ptr;
	int i;

	ptr = dst + pitch; // odd

	DETAIL(ptr, pitch);
	AVERAGE_FIRST(ptr, pitch);
	for (i = count - 2; i > 0; i--)
	{
		ptr += 2 * pitch;
		DETAIL(ptr, pitch);
		AVERAGE(ptr, pitch);
	}
	ptr += 2 * pitch;
	DETAIL_LAST(ptr, pitch);
	AVERAGE(ptr, pitch);
}

static void wavelet_even_column_first(wv_pel *dst, const int pitch, const int rowpitch, const int count)
{
	wv_pel *rowptr;
	int i;

	rowptr = dst - rowpitch;

	for (i = 2 * count; i > 0; i--)
	{ // column (first)
		DETAIL(rowptr, rowpitch);
		AVERAGE_FIRST(rowptr, rowpitch);
		rowptr += pitch;
	}
}

static void wavelet_even_column(wv_pel *dst, const int pitch, const int rowpitch, const int count)
{
	wv_pel *rowptr;
	int i;

	rowptr = dst - rowpitch;

	for (i = 2 * count; i > 0; i--)
	{ // column (generic)
		DETAIL(rowptr, rowpitch);
		AVERAGE(rowptr, rowpitch);
		rowptr += pitch;
	}
}

static void wavelet_column_last(wv_pel *dst, const int pitch, const int rowpitch, const int count)
{
	wv_pel *rowptr;
	int i;

	rowptr = dst;

	for (i = 2 * count; i > 0; i--)
	{ // column (last)
		DETAIL_LAST(rowptr, rowpitch);
		AVERAGE(rowptr, rowpitch);
		rowptr += pitch;
	}
}


/** Computes the discrete wavelet decomposition of a 2D image.
 * This implementation uses a lifting implementation of the CDF(2,2) wavelet,
 * which is lossless for integer data. The transform is seperable in x and y,
 * but for performance reasons (cache), both are applied at essentially the same
 * time. The 2D transform splits the data into average and horizontal (HD),
 * vertical (VD) and diagonal (DD) detail coefficients, and can then be applied
 * again to the resulting average coefficients.
 * Inverse of inverse_wavelet_transform().
 * @param[in,out] dst Input data which is transformed into wavelet coefficients
 *   (as it is computed in place).
 * @param[in] width Width of data in dst, has to be divisble by 2^(levels+1).
 * @param[in] height Height of data in dst, has to be divisble by 2^(levels+1).
 * @param[in] levels How often the transform is applied to the resulting average
 *   coefficients. Largest valid value is wv_log2i(MIN(width, height)) - 2.
**/
void wavelet_transform(wv_pel *dst, const int width, const int height, const int levels)
{
	int cl;

	for (cl = 0; cl < levels; cl++)
	{
		int wmid = width >> (cl + 1);
		int pitch = 1 << cl, rowpitch = width << cl;
		int i;

		/* precompute the first three rows */
		wavelet_row(dst, pitch, wmid);
		wavelet_row(dst + pitch * width, pitch, wmid);
		wavelet_row(dst + 2 * pitch * width, pitch, wmid);
		/* do the column transform on 1st two rows */
		wavelet_even_column_first(dst + 2 * pitch * width, pitch, rowpitch, wmid);
		for (i = 3 * pitch; i < height - pitch; i += 2 * pitch)
		{
			wavelet_row(dst + i * width, pitch, wmid); // odd row
			wavelet_row(dst + (i + pitch) * width, pitch, wmid); // even row
			wavelet_even_column(dst + (i + pitch) * width, pitch, rowpitch, wmid); // previous two columns
		}
		wavelet_row(dst + i * width, pitch, wmid); // last (odd) row
		wavelet_column_last(dst + i * width, pitch, rowpitch, wmid); // and the last two columns
	}
}


/** Inverse first average coefficient for the CDF(2,2)-transform (lifting) **/
#define INV_AVERAGE_FIRST(ptr, pitch) ((ptr)[0] -= (ptr)[(pitch)] / 2)
/** Inverse average coefficient for the CDF(2,2)-transform (lifting).  **/
#define INV_AVERAGE(ptr, pitch) ((ptr)[0] -= ((ptr)[-(pitch)] + (ptr)[(pitch)]) / 4)
/** Inverse detail coefficient for the CDF(2,2)-transform (lifting). \Warning We store this at -pitch! **/
#define INV_DETAIL(ptr, pitch) ((ptr)[-(pitch)] += ((ptr)[-2 * (pitch)] + (ptr)[0]) / 2)
/** Inverse last detail coefficient for the CDF(2,2)-transform (lifting) **/
#define INV_DETAIL_LAST(ptr, pitch) ((ptr)[(pitch)] += (ptr)[0])

static void inverse_wavelet_row(wv_pel *dst, const int pitch, const int count)
{
	wv_pel *ptr;
	int i;

	ptr = dst; // even (average)

	INV_AVERAGE_FIRST(ptr, pitch);
	for (i = count - 1; i > 0; i--)
	{
		ptr += 2 * pitch;
		INV_AVERAGE(ptr, pitch);
		INV_DETAIL(ptr, pitch);
	}
	INV_DETAIL_LAST(ptr, pitch);
}

static void inverse_wavelet_column_first(wv_pel *dst, const int pitch, const int rowpitch, const int count)
{
	wv_pel *rowptr;
	int i;

	rowptr = dst;

	for (i = 2 * count; i > 0; i--)
	{ // column (first)
		INV_AVERAGE_FIRST(rowptr, rowpitch);
		rowptr += pitch;
	}
}

static void inverse_wavelet_column_even(wv_pel *dst, const int pitch, const int rowpitch, const int count)
{
	wv_pel *rowptr;
	int i;

	// column
	rowptr = dst;

	for (i = 2 * count; i > 0; i--)
	{ // column
		INV_AVERAGE(rowptr, rowpitch);
		INV_DETAIL(rowptr, rowpitch);
		rowptr += pitch;
	}
}

static void inverse_wavelet_column_last(wv_pel *dst, const int pitch, const int rowpitch, const int count)
{
	wv_pel *rowptr;
	int i;

	// column
	rowptr = dst;

	for (i = 2 * count; i > 0; i--)
	{ // column
		INV_DETAIL_LAST(rowptr, rowpitch);
		rowptr += pitch;
	}
}


/** Computes the inverse discrete wavelet decomposition of a 2D image.
 * This implementation uses a lifting implementation of the CDF(2,2) wavelet,
 * which is lossless for integer data. The transform is seperable in x and y,
 * but for performance reasons (cache), both are applied at essentially the same
 * time. The 2D transform splits the data into average and horizontal (HD),
 * vertical (VD) and diagonal (DD) detail coefficients, and can then be applied
 * again to the resulting average coefficients.
 * Inverse of wavelet_transform().
 * @param[in,out] dst Input wavelet coefficients which are transformed back into
 *   the original data (as it is computed in place).
 * @param[in] width Width of data in dst, has to be divisble by 2^(levels+1).
 * @param[in] height Height of data in dst, has to be divisble by 2^(levels+1).
 * @param[in] levels How often the transform was applied to the resulting average
 *   coefficients. Largest valid value is wv_log2i(MIN(width, height)) - 2.
**/
void inverse_wavelet_transform(wv_pel *dst, const int width, const int height, const int levels)
{
	int cl;

	for (cl = levels - 1; cl >= 0; cl--)
	{
		int wmid = width >> (cl + 1);
		int pitch = 1 << cl, rowpitch = width << cl;
		int i;

		// preculate first row (w/ vertical transform)
		inverse_wavelet_column_first(dst, pitch, rowpitch, wmid);
		for (i = 2 * pitch; i < height; i += 2 * pitch)
		{
			inverse_wavelet_column_even(dst + i * width, pitch, rowpitch, wmid);
			inverse_wavelet_row(dst + (i - 2 * pitch) * width, pitch, wmid);
			inverse_wavelet_row(dst + (i - pitch) * width, pitch, wmid);
		}
		inverse_wavelet_column_last(dst + (i - 2 * pitch) * width, pitch, rowpitch, wmid);
		inverse_wavelet_row(dst + (i - 2 * pitch) * width, pitch, wmid);
		inverse_wavelet_row(dst + (i - pitch) * width, pitch, wmid);
	}
}


/*static void quantize(wv_pel* dst, const int num, const int quant)
{
	if (quant > 1)
	{
		int i;

		for (i = 0; i < num; i++)
			*dst++ /= quant;
	}
}*/


/*int dead_zone_dequant(const int val, const int fac)
{ // has been multiplied by 10 to avoid floating point calculations entirely
	if (val > 0)
		return ((val * 10 + 5) * fac) / 10; // (val + 0.5) * fac
	else if (val < 0)
		return ((val * 10 - 5) * fac) / 10; // (val - 0.5) * fac
	else
		return 0;
}

static void dequantize(wv_pel* dst, const int num, const int quant)
{
	if (quant > 1)
	{
		int i;

		// start dequantizing w/ T1
		for (i = 0; i < num; i++)
			*dst++ = (wv_pel)dead_zone_dequant(*dst, quant);
	}
}*/

#define dead_zone_dequant(v, f) (((v) > 0) ? ((((v) << 1) + 1) * (f)) >> 1 : ((v) < 0) ? -((((-(v) << 1) + 1) * (f)) >> 1) : 0)
#define OFS(x, y) ((y) * pitch * width + (x) * pitch)

/** Estimates the error incurred by a particular quantization of wavelet
 * coefficients in the the original (untransformed) data. For the reasoning
 * behind the actual computations performed here, take a look the included
 * "wavelet_analysis.nb" Mathematica notebook.
 * @param[in] a Input wavelet coefficients. If this is NULL, then the error is
 *   computed based on the assumptions that the original data is evenly
 *   distributed and that quantizing the wavelet coefficients results in the
 *   same error as quantizing the original data.
 * @param[in] width Width of data in a, has to be divisble by 2^(levels+1).
 * @param[in] owidth Original width of data in a (not neccessarily pow2).
 * @param[in] height Height of data in a, has to be divisble by 2^(levels+1).
 * @param[in] oheight Original height of data in a (not neccessarily pow2).
 * @param[in] level Which level of the decomposition to work on.
 * @param[in] quant Quantizer (i.e. value by which to divide the data in a).
 * @param[out] err Estimated mean square errors (for average or HD, VD, DD).
**/
void estimate_error(const wv_pel *wavelet, const int width, const int owidth, const int height, const int oheight, int level, const int quant, double err[3])
{
	double diff;
	/* how big is a block ("level") of coefficients in x and y */
	int block_size_x = width >> level;
	int block_size_y = height >> level;
	int unused_size_x, unused_size_y, pitch;
	int y, x, ofs;

	err[0] = err[1] = err[2] = 0.0;

	if (quant <= 1)
		return;

	if (!wavelet)
	{ /* estimate errors independently of actual data */
		/* For uniform data and quantizer q, we have errors of the form
		  [-q/2, -q/2 + 1, ..., 0, ..., q/2 - 1] repeating endlessly. Also,
		  we report the mean-squared error, which means we have to square
		  each indivudal term. Because they repeat, it is enough to sum one
		  "sub-series" and divide by the number of terms in it to get the
		  average. */
		double numq = quant / 2.0 - 1.0; /* # of terms in +ive half of series */
		double est_error;

		/* First term squared and divided by the number of terms (which is quant) */
		est_error = quant / 4.0;

		/* Use the fact that Sum[i^2, {i, n}] = 1/6 n (1 + n) (1 + 2n), but
		  only divide by 3 because of -ive and +ive halves of the series.
		  Also divide by quant (# of total terms) to get the average. */
		est_error += (numq * (1 + numq) * (1 + 2 * numq)) / (3.0 * quant);

		err[0] = err[1] = err[2] = est_error;
	}
	else
	{
		#define ERR(v) ((v) - dead_zone_dequant((v) / quant, quant))

		if (block_size_x < 2 || block_size_y < 2)
		{ /* initial average block */
			level--;
			pitch = 1 << level;
			block_size_x = width >> level;
			block_size_y = height >> level;
			unused_size_x = (width - owidth) >> level;
			unused_size_y = (height - oheight) >> level;
			for (y = 0; y < block_size_y - unused_size_y; y++)
			{
				for (x = 0; x < block_size_x - unused_size_x; x++)
				{
					ofs = OFS(x, y);
					diff = ERR(wavelet[ofs]);
					err[0] += diff * diff;
				}
			}
			err[0] /= (block_size_x - unused_size_x) * (block_size_y - unused_size_y);
		}
		else
		{ /* detail block */
			int HDofs, VDofs;

			unused_size_x = (width - owidth) >> level;
			unused_size_y = (height - oheight) >> level;
			pitch = 1 << level;
			HDofs = (pitch >> 1);
			VDofs = (pitch >> 1) * width;

			#define HD(ofs) wavelet[(ofs) + HDofs]
			#define VD(ofs) wavelet[(ofs) + VDofs]
			#define DD(ofs) wavelet[(ofs) + HDofs + VDofs]

			for (y = 0; y < block_size_y - unused_size_y; y++)
			{
				/* horizontal detail index (HD) */
				#define H 0
				/* vertical detail index (VD) */
				#define V 1
				/* diagonal detail index (DD) */
				#define D 2
				/* left index (when used for y ^= top) */
				#define L 0
				/* center index */
				#define C 1
				/* right index (when used for y ^= bottom) */
				#define R 2

				/* y index left, right (actually top & bottom) */
				int yli = MAX(y - 1, 0), yri = MIN(y + 1, block_size_y - 1);
				/* store errors incurred (for each HD, VD, DD a 3x3 stencil) */
				float e[D + 1][R + 1][R + 1];

				/* We copy the error in the coefficients into this array and then
				only update the ones we need (after shifting them). This way
				the inner loop only references fixed offsets into errs. */
				ofs = OFS(0, yli);
				e[H][L][C] = e[H][L][R] = ERR(HD(ofs));
				e[V][L][C] = e[V][L][R] = ERR(VD(ofs));
				e[D][L][C] = e[D][L][R] = ERR(DD(ofs));
				ofs = OFS(0, y);
				e[H][C][C] = e[H][C][R] = ERR(HD(ofs));
				e[V][C][C] = e[V][C][R] = ERR(VD(ofs));
				e[D][C][C] = e[D][C][R] = ERR(DD(ofs));
				ofs = OFS(0, yri);
				e[H][R][C] = e[H][R][R] = ERR(HD(ofs));
				e[V][R][C] = e[V][R][R] = ERR(VD(ofs));
				e[D][R][C] = e[D][R][R] = ERR(DD(ofs));

				for (x = 0; x < block_size_x - unused_size_x; x++)
				{
					int xri = MIN(x + 1, block_size_x - 1); /* right */

					/* shift horizontal detail error left */
					e[H][L][L] = e[H][L][C]; e[H][C][L] = e[H][C][C]; e[H][R][L] = e[H][R][C];
					e[H][L][C] = e[H][L][R]; e[H][C][C] = e[H][C][R]; e[H][R][C] = e[H][R][R];
					/* shift vertical detail error left */
					e[V][L][L] = e[V][L][C]; e[V][C][L] = e[V][C][C]; e[V][R][L] = e[V][R][C];
					e[V][L][C] = e[V][L][R]; e[V][C][C] = e[V][C][R]; e[V][R][C] = e[V][R][R];
					/* shift diagonal detail error left */
					e[D][L][L] = e[D][L][C]; e[D][C][L] = e[D][C][C]; e[D][R][L] = e[D][R][C];
					e[D][L][C] = e[D][L][R]; e[D][C][C] = e[D][C][R]; e[D][R][C] = e[D][R][R];

					/* fill in the new errors on the right side */
					ofs = OFS(xri, yli);
					e[H][L][R] = ERR(HD(ofs)); e[V][L][R] = ERR(VD(ofs)); e[D][L][R] = ERR(DD(ofs));
					ofs = OFS(xri, y);
					e[H][C][R] = ERR(HD(ofs)); e[V][C][R] = ERR(VD(ofs)); e[D][C][R] = ERR(DD(ofs));
					ofs = OFS(xri, yri);
					e[H][R][R] = ERR(HD(ofs)); e[V][R][R] = ERR(VD(ofs)); e[D][R][R] = ERR(DD(ofs));

					/* HD contributions */
					diff = (e[H][C][L] + e[H][C][C]) / 4.0;
					err[0] += diff * diff; /* x even, y even */
					diff = (6 * e[H][C][C] - e[H][C][L] - e[H][C][R]) / 8.0;
					err[0] += diff * diff; /* x odd, y even */
					diff = (e[H][C][L] + e[H][C][C] + e[H][R][L] + e[H][R][C]) / 8.0;
					err[0] += diff * diff; /* x even, y odd */
					diff = (6 * (e[H][C][C] + e[H][R][C]) - e[H][C][L] - e[H][C][R] - e[H][R][L] - e[H][R][R]) / 16.0;
					err[0] += diff * diff; /* x odd, y odd */
					/* VD contributions */
					diff = (e[V][L][C] + e[V][C][C]) / 4.0;
					err[1] += diff * diff; /* x even, y even */
					diff = (e[V][L][C] + e[V][L][R] + e[V][C][C] + e[V][C][R]) / 8.0;
					err[1] += diff * diff; /* x odd, y even */
					diff = (6 * e[V][C][C] - e[V][L][C] - e[V][R][C]) / 8.0;
					err[1] += diff * diff; /* x even, y odd */
					diff = (6 * (e[V][C][C] + e[V][C][R]) - e[V][L][C] - e[V][L][R] - e[V][R][C] - e[V][R][R]) / 16.0;
					err[1] += diff * diff; /* x odd, y odd */
					/* DD contributions */
					diff = (e[D][L][L] + e[D][L][C] + e[D][C][L] + e[D][C][C]) / 16.0;
					err[2] += diff * diff; /* x even, y even */
					diff = (e[D][L][L] + e[D][L][R] + e[D][C][L] + e[D][C][R] - 6 * (e[D][L][C] + e[D][C][C])) / 32.0;
					err[2] += diff * diff; /* x odd, y even */
					diff = (e[D][L][L] + e[D][L][C] + e[D][R][L] + e[D][R][C] - 6 * (e[D][C][L] + e[D][C][C])) / 32.0;
					err[2] += diff * diff; /* x even, y odd */
					diff = (e[D][L][L] + e[D][L][R] + e[D][R][L] + e[D][R][R] - 6 * (e[D][L][C] + e[D][C][L] + e[D][C][R] + e[D][R][C]) + 36 * e[D][C][C]) / 64.0;
					err[2] += diff * diff; /* x odd, y odd */
				}
				#undef H
				#undef V
				#undef D
				#undef L
				#undef C
				#undef R
			}
			#undef HD
			#undef VD
			#undef DD
			/* as we have computed errors for 4 * as many pixels */
			diff = (block_size_x - unused_size_x) * (block_size_y - unused_size_y) * 4;
			err[0] /= diff; err[1] /= diff; err[2] /= diff;
		}
		#undef ERR
	}
}

#undef OFS
#undef dead_zone_dequant


/* channel.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#include <stdlib.h>
#include <string.h>
#include <math.h>

#ifndef M_LOG2E
#define M_LOG2E 1.4426950408889634074 /* log_2 e */
#endif

/** \internal Agglomerate bitplanes so that we don't get high bps blocking
 * lower ones with a better improvement per bit.
 * Usually the improvement per bit of a higher bitplane is better than the
 * ones following it because higher planes contain more zeros (i.e. more
 * efficient run-length coding) and thus use less bits. In addition, they also
 * improve more than lower bitplanes (as lower bitplanes only impart a smaller
 * change). Unfortunately this is not always the case (most likely due to the
 * error estimation). So what we do is we agglomerate (combine) bitplanes
 * until the above is true.
**/
static int agglomerate_bitplanes(int num_planes, t_wv_bitplane* plane)
{
	int agg_count = 0;
	int current_plane = 0;
	int next_plane = 1;

	while (next_plane < num_planes)
	{
		if (plane[current_plane].improvement_per_bit < plane[next_plane].improvement_per_bit)
		{ /* this plane improve less per bit than the next (less significant!) one */
			/* add their improvement and number of bits and recompute improvement per bit */
			plane[current_plane].improvement += plane[next_plane].improvement;
			plane[current_plane].num_bits += plane[next_plane].num_bits;
			plane[current_plane].improvement_per_bit = plane[current_plane].improvement / plane[current_plane].num_bits;
			plane[current_plane].count++;
			/* and the next plane is gone */
			plane[next_plane].improvement = 0.0;
			plane[next_plane].count = 0;
			next_plane++;
			agg_count++;
		}
		else
			current_plane = next_plane++; /* look at the next 2 planes */
	}
	return agg_count;
}


/** \internal Fills a schedule from the entries of the bitplanes.
 * The schedule contains the order in which bitplanes from block are written.
 * The ordering criterium is the improvement per bit, under the constraint that
 * the bitplanes are written from the top down (i.e. most significant bits
 * first) and that no bit-planes can be skipped. This means we only have to
 * choose from which block the next bitplanes comes.
 * Also computes the amount of space for fractional bits in wv_fixed_point.
**/
static void convert_planes_to_schedule(t_wv_channel* cc)
{
	double multiplier;
	t_wv_schedule_entry *se;
	int i, j;
	int fractional_bits;
	unsigned char next_plane[wv_MAX_BLOCKS + 1];

	cc->num_schedule_entries = 0;
	multiplier = 1.0; /* (Ceiling[Log[2, -->1<-- + 2^Range[0, 127]]] == Range[1, 128]) -> True */
	for (i = 0; i < cc->num_blocks; i++)
	{
		next_plane[i] = 0; /* initialize next_plane, so we start at the top */
		for (j = 0; j < cc->block[i].num_planes; j++)
		{
			multiplier += MAX(0.0, cc->block[i].plane[j].improvement); /* agglomerated entries have improvement 0.0 */
			cc->num_schedule_entries += cc->block[i].plane[j].count > 0;
		}
	}
	/* leave 1 bit for sign and 1 bit for overflows as we force the minimum improvement to be 1 */
	fractional_bits = sizeof(wv_fixed_point) * 8 - 2 - ceil(log(multiplier)*M_LOG2E);
	wv_ASSERT(fractional_bits >= 2, "Not enough fractional precision bits / possible overflow: Increase size of wv_fixed_point");
	cc->schedule_fractional_bits = MAX(0, fractional_bits);
	multiplier = pow(2.0, cc->schedule_fractional_bits);

	cc->schedule = malloc(cc->num_schedule_entries * sizeof *cc->schedule);

	/* now add the bitplanes in the right order */
	for (i = 0, se = cc->schedule; i < cc->num_schedule_entries; i++, se++)
	{
		double best_improvement = -1e64;
		t_wv_bitplane *bp;
		unsigned char best_block = 0;

		/* find the next entry with the best improvement per size */
		for (j = 0; j < cc->num_blocks; j++)
			if (next_plane[j] < cc->block[j].num_planes && cc->block[j].plane[next_plane[j]].improvement_per_bit > best_improvement)
			{
				best_improvement = cc->block[j].plane[next_plane[j]].improvement_per_bit;
				best_block = j;
			}

		bp = &cc->block[best_block].plane[next_plane[best_block]];

		se->b = best_block;
		se->p = next_plane[best_block];
		se->num_bits = bp->num_bits;
		se->improvement = MAX(1, bp->improvement * multiplier);
		se->count = bp->count;
		next_plane[best_block] += bp->count; /* update next_plane */
//		printf("block: %i plane: %i imp: %f size: %i imp/b: %f (%i)\n", se->b, se->p, se->improvement / multiplier, se->num_bits, (se->improvement / multiplier) / se->num_bits, se->count);
	}
	/* now we can already free the planes (as all the information is in the schedule) */
	for (i = cc->num_blocks - 1; i >= 0; i--)
		if (cc->block[i].plane)
		{
			free(cc->block[i].plane);
			cc->block[i].plane = NULL;
		}
}


/** Initializes a channel for subsequent encoding.
 * (Lossless) Inverse of wv_channel_uncompact().
 * \Warning The actual width and height of the data in Data has to be the
 * next larger power of 2!
 * \Warning The data in Data is modified!
 * @param[in] Width Width of the \em used data in Data.
 * @param[in] Height Height of the \em used data in Data.
 * @param[in] DataFormat Format of the data stored, or any other 8-bit value
 *   you want to store in an encoded channel. This information can be queried
 *   with wv_channel_get_data_format().
 * @param[in,out] Data Pixel data of the channel on input; receives the
 *   wavelet transformed coefficients. This \em has to be stored with a width
 *   and height that are the next-bigger power of 2 to Width and Height given
 *   above. A suitable extension should be done on the boundary, e.g. source
 *   data is 720x512, then Width = 720 and Height = 512, but the data in
 *   Data has to be 1024x512!
 * @param[in] Approximate Flag which can be set > 0 to indicate the error
 *   computation should be more approximate. This results in less than ideal
 *   compression estimates.
 * @param[in] ReorderTable Permutation table used to reorder the wavelet
 *   coefficients. Has to have been initialized at least the dimensions of the
 *   given image. Larger tables can be reused for smaller images. Can be freed
 *   after wv_channel_compact() has returned; should be shared between
 *   multiple channels with the same (or smaller) dimensions.
 * @param[in] Progress Callback to a user-function with the progress of the
 *   compaction procedure. Can be \c NULL.
 * @param[in] UserData Userdata passed to Progress.
 * @return The compacted channel to be used in conjunction with wv_encode().
**/
t_wv_channel* wv_channel_compact(const int Width, const int Height, const unsigned char DataFormat, wv_pel* Data,
	const int Approximate, const t_reorder_table* ReorderTable, wv_progress_function Progress, void* UserData)
{
	t_wv_channel* cc = NULL;

	if (Width >= 1 && Height >= 1 && Data && ReorderTable)
	{
		wv_pel *wavelet = (Approximate == 0) ? Data : NULL;
		int width2 = 1 << wv_log2i(Width - 1); /* next bigger pow2 */
		int height2 = 1 << wv_log2i(Height - 1); /* next bigger pow2 */
		int num_levels = MAX(0, wv_log2i(MIN(width2, height2)) - 2);
		int cur, end; /* current and maximum value for the progress counter */
		int i, size;
		unsigned char block_bits, max_planes;

		cc = malloc(sizeof *cc);

		cc->width = Width;
		cc->height = Height;

		cc->num_blocks = 1 + 3 * num_levels; /* initial average + N * (HD + VD + DD) */
		cc->data_format = DataFormat;

		cc->block = malloc(cc->num_blocks * sizeof *cc->block);
		size = 0;
		for (i = 0; i < cc->num_blocks; i++)
		{ /* compute offsets and sizes of blocks as well as total size (for allocating reordered) */
			int shift = (cc->num_blocks - 1) / 3 - MAX(0, i - 1) / 3; /* shift is the same for i == 0 and 1-3 */
			int used_width, used_height;

			if (MIN(width2, height2) <= 2)
				shift = 0; /* only the untransformed data written as average */
			get_block_dimensions(width2, cc->width, height2, cc->height, shift, &used_width, &used_height);
			cc->block[i].offset = (i == 0) ? 0 : cc->block[i - 1].offset + cc->block[i - 1].size;
			cc->block[i].size = used_width * used_height;
			size += cc->block[i].size;
		}

		wavelet_transform(Data, width2, height2, num_levels);
		cc->reordered_channel = malloc(size * sizeof *cc->reordered_channel);
		reorder(cc->reordered_channel, Data, ReorderTable, cc->width, cc->height, 0);

		/* First quick pass to compute the number of planes and know how much work we have to do for error estimation */
		block_bits = wv_log2i(cc->num_blocks - 1); /* bits need for coding a block-# */
		max_planes = 0;
		end = 0;
		for (i = 0; i < cc->num_blocks; i++)
		{
			t_wv_block* bl = &cc->block[i];
			int j;
			wv_pel max_value;

			bl->plane = NULL;

			/* find the maximum absolute value in the block to find # of bpp */
			max_value = 0;
			for (j = bl->offset; j < bl->offset + bl->size; j++)
				max_value = MAX(max_value, abs(cc->reordered_channel[j]));
			bl->num_planes = wv_log2i(max_value);
			if (bl->num_planes > 0)
			{ /* compute size of encoded bitplanes */
				int bit_stat[sizeof (wv_pel) * 8 - 1]; /* maximum # of bitplanes (minus sign) */
				int j;

				max_planes = MAX(max_planes, bl->num_planes);
				end += (bl->num_planes + 1) * (bl->size / cc->block[0].size);
				bl->plane = malloc(bl->num_planes * sizeof *bl->plane);
				rice_encode_size(cc->reordered_channel + bl->offset, bl->size, 0, bl->num_planes, bit_stat);
				for (j = 0; j < bl->num_planes; j++)
				{
					t_wv_bitplane* bp = &bl->plane[j];

					bp->count = 1;
					bp->num_bits = (j == 0) ? bit_stat[0] : bit_stat[j] - bit_stat[j - 1];
					bp->num_bits += block_bits + (i == 0); /* add the bits needed to specify the block index (+ still-writing flag) */
				}
			}
		}
		end--; /* make it inclusive */
		cc->min_pel_store = MAX(0, wv_log2i(max_planes + 2 + 1) - 3); /* planes + 2 bits transform headroom + sign */

		/* now estimate incurred error independently for each block */
		cur = 0;
		for (i = 0; i < num_levels + 1; i++)
		{
			int bli = MAX(0, 3 * i - 2); /* block index */
			int max_planes = cc->block[bli].num_planes;
			int num = (i == 0) ? 1 : 3; /* either initial average block, or 3 blocks with HD, VD, DD */
			int work_factor = cc->block[bli].size / cc->block[0].size; /* to make progress semi-linear */
			int j, k;

			for (k = 1; k < num; k++)
				max_planes = MAX(max_planes, cc->block[bli + k].num_planes);
			if (max_planes > 0)
			{
				double mse_stat[sizeof (wv_pel) * 8][3];

				/* compute errors for those planes */
				for (j = 0; j <= max_planes; j++)
				{
					estimate_error(wavelet, width2, cc->width, height2, cc->height, num_levels + 1 - i, 1 << j, mse_stat[j]);
					if (Progress)
					{
						for (k = 0; k < num; k++)
							if (j <= cc->block[bli + k].num_planes)
								cur += work_factor;
						Progress(cur, end, UserData);
					}
				}
				/* and now "distribute" it to each of the subbands */
				for (k = 0; k < num; k++)
				{
					t_wv_block* bl = &cc->block[bli + k];

					for (j = 0; j < bl->num_planes; j++)
					{
						t_wv_bitplane* bp = &bl->plane[j];

						bp->improvement = mse_stat[bl->num_planes - j][k] - mse_stat[bl->num_planes - j - 1][k];
						bp->improvement_per_bit = bp->improvement / bp->num_bits;
					}
					agglomerate_bitplanes(bl->num_planes, bl->plane);
				}
			}
		}
		convert_planes_to_schedule(cc);
	}
	return cc;
}


/** Converts compacted channel data back to its original representation.
 * (Lossless) Inverse of wv_channel_compact().
 * \Warning The actual width and height of the free space at Data has to be the
 * next larger power of 2 of CC->width and CC->height!
 * @param[in] CC Compacted channel data (usually gotten from wv_decode()).
 * @param[in] ReorderTable Permutation table used to reorder the wavelet
 *   coefficients. Has to have been initialized with exact same width and height
 *   as given here (or an identical integer multiple of both dimensions), e.g.
 *   for our 720x512 data, we can also use a table created for 1440x1024, but
 *   not 1440x512. Should be shared between multiple channels with the same
 *   dimensions.
 * @param[out] Data Receives the original pixel data of the compacted channel.
 *   This \em has to provide space for a width and height that are the
 *   next-bigger power of 2 to CC->width and CC->height, e.g. CC is 720x512,
 *   then Data has to allocated for 1024x512!
**/
void wv_channel_uncompact(const t_wv_channel *CC, const t_reorder_table* ReorderTable, wv_pel *Data)
{
	if (CC && ReorderTable && Data)
	{
		int width2 = 1 << wv_log2i(CC->width - 1); /* next bigger pow2 */
		int height2 = 1 << wv_log2i(CC->height - 1); /* next bigger pow2 */

		/* when the dimensions are not pow2, unreorder does not write to every
		  entry in Data. This is not a problem (as the undefined coefficients do
		  not influence the area originally compressed). The wavelet transform
		  always operates on all of the data, though, so to make Valgrind happy,
		  we zero it out. */
		if (CC->width != width2 || CC->height != height2)
			memset(Data, 0, width2 * height2 * sizeof *Data);
		reorder(Data, CC->reordered_channel, ReorderTable, CC->width, CC->height, 1);
		inverse_wavelet_transform(Data, width2, height2, (CC->num_blocks - 1) / 3);
	}
}


/** Frees a compacted channel and all its associated data.
 * @param[in] CC Channel to be freed.
**/
void wv_channel_free(t_wv_channel* CC)
{
	if (CC)
	{
		free(CC->schedule);
		free(CC->block);
		free(CC->reordered_channel);
		free(CC);
	}
}


/** Gets the data format (retained data) of a channel.
 * This information was kept when encoding the channel and is set when
 * encoding a channel.
 * @param[in] CC Channel whose retained data you want to know.
 * @return User-data that was stored (8 bits).
**/
unsigned char wv_channel_get_data_format(const t_wv_channel *CC)
{
	if (CC)
		return CC->data_format;
	return 0;
}


/** Gets the width of a (usually decoded) channel.
 * @param[in] CC Channel whose width you are interested in.
 * @return Width of the channel, 0 on error
**/
int wv_channel_get_width(const t_wv_channel *CC)
{
	if (CC)
		return CC->width;
	return 0;
}


/** Gets the height of a (usually decoded) channel.
 * @param[in] CC Channel whose height you are interested in.
 * @return Height of the channel, 0 on error
**/
int wv_channel_get_height(const t_wv_channel *CC)
{
	if (CC)
		return CC->height;
	return 0;
}


/* io.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#include <stdlib.h>
#include <string.h>
#include <math.h>


/** Current header-version. **/
#define wv_CUR_HEADER_VERSION 4
/** Current bitstream-version. **/
#define wv_CUR_BITSTREAM_VERSION 5

const char wv_copyright[] = " wavelet 3.4.0 Copyright 2001-2007 Daniel Vollmer ";
static const unsigned int wv_magic = 0x5b765d00 | (wv_CUR_HEADER_VERSION << 4) | wv_CUR_BITSTREAM_VERSION;

/** \Internal In how many bits we store the required size (in bytes) of wv_pel (1 << n).
 * 2 bits allow for a max value of 3, which allows for up to 64bit wv_pels.
**/
#define wv_PEL_STORE_BITS 2

/** Callback function for the scheduler. **/
typedef void (*t_scheduler_callback)(int ci, const t_wv_channel *cc, const t_wv_schedule_entry *se, double mse, int bits_allocated, void* userdata);

/** \internal Interleaves a number of schedules so that the currently lowest-
 * fidelity channel is written next. This takes a channel's given target mse
 * as well as their requested relative final errors into account. The process
 * is stopped once all constraints are satisfied or we have no more bits in
 * our budget.
 * @param[in] NumChannels # of channels in Channel.
 * @param[in] Channel NumChannels pointers to t_wv_channel, which actually
 *   treated as const (so they are used read-only). Not declared as const due
 *    http://c-faq.com/ansi/constmismatch.html. Need identical dimensions.
 * @param[in] TargetMSE NumChannels doubles that prescribe the target mean-
 *   square for each given channel. If the value is non-negative then it is
 *   an absolute value which is strictly required (i.e. the method will return
 *   with an error-code if it cannot be achieved), if it is negative then it
 *   is a relative value that is resolved (if possible) with its closest
 *   absolute neighbour, first checked on the right, then on the left side.
 * @param[in] MaxBits Bits allocated to compressed data, 0 ^= infinite bits.
 * @param[in] MinBits Minimum number of bits spent on each channel. These is
 *   interpreted as a hard constraint, and failure to satisfy them results in
 *   failure, unless the channel had fewer bits to start with. This number
 *   will be rounded up depending on the granularity of the blocks.
 * @param[in] cb Callback function to be executed for each bitplane of each
 *   block.
 * @param[in] userdata Userdata passed onto the cb callback-function.
 * @return Number of bits written (or -1 on failure to satisfy constraints).
**/
static int interleave_schedules(const int NumChannels, t_wv_channel* Channel[], const double TargetMSE[], int MaxBits, const int MinBits[], t_scheduler_callback cb, void* userdata)
{
	double fp_factor;
	wv_fixed_point happy_mse[wv_MAX_CHANNELS + 1]; /* for achieving happiness */
	wv_fixed_point terminate_mse[wv_MAX_CHANNELS + 1]; /* for terminating the loop */
	wv_fixed_point ofs_mse[wv_MAX_CHANNELS + 1];
	wv_fixed_point cur_mse[wv_MAX_CHANNELS + 1];
	wv_fixed_point ofs_min, ofs_max;
	int channel_bits = wv_log2i(NumChannels - 1);
	int i, j, entries_left, happy_count, initial_bits, left_bits;
	int next_entry[wv_MAX_CHANNELS + 1];
	int min_bits[wv_MAX_CHANNELS + 1], min_bits_needed;
	unsigned char next_plane[wv_MAX_CHANNELS + 1][wv_MAX_BLOCKS + 1];
	unsigned char right_shift[wv_MAX_CHANNELS + 1];
	unsigned char am_i_happy[wv_MAX_CHANNELS + 1];
	unsigned char min_frac;

	/* find the smalllest shift value, as we'll use that for all channels */
	min_frac = 255;
	for (i = 0; i < NumChannels; i++)
		min_frac = MIN(min_frac, Channel[i]->schedule_fractional_bits);
	fp_factor = pow(2.0, min_frac);

	/* initialise everything */
	entries_left = 0;
	ofs_max = -1;
	ofs_min = 0;
	min_bits_needed = 0;
	for (i = 0; i < NumChannels; i++)
	{
		double target;

		next_entry[i] = 0;
		for (j = 0; j < Channel[i]->num_blocks; j++)
			next_plane[i][j] = 0;

		entries_left += Channel[i]->num_schedule_entries;
		right_shift[i] = Channel[i]->schedule_fractional_bits - min_frac;

		cur_mse[i] = ofs_mse[i] = 0;
		min_bits[i] = 0;
		for (j = 0; j < Channel[i]->num_schedule_entries; j++)
		{
			const t_wv_schedule_entry *se = &Channel[i]->schedule[j];

			cur_mse[i] += ((se->improvement - 1) >> right_shift[i]) + 1;
			if (min_bits[i] < MinBits[i])
			{
				min_bits[i] += se->num_bits;
				min_bits_needed += channel_bits * se->count; /* need to include overhead in total */
			}
		}
		min_bits_needed += min_bits[i];

		/* limit to within [-cur_mse, cur_mse] for each channel to prevent overflows */
		target = TargetMSE[i] * fp_factor;
		if (target >= 0.0)
			terminate_mse[i] = happy_mse[i] = MIN(target, cur_mse[i]);
		else
		{
			ofs_mse[i] = MAX(target, -cur_mse[i]);
			if (ofs_mse[i] == 0)
				ofs_mse[i] = -1; /* make sure we didn't round a negative target mse to 0 */
		}

		ofs_max = MAX(ofs_max, ofs_mse[i]);
		ofs_min = MIN(ofs_min, ofs_mse[i]);
	}

	if (MaxBits > 0 && MaxBits < min_bits_needed)
		return -1; /* cannot satisfy MinBits criterion */

	if (ofs_min < 0)
	{ /* we have relative values */
		if (ofs_max >= 0)
		{ /* and we can fix them to absolute values */
			for (i = 0; i < NumChannels; i++)
			{
				if (ofs_mse[i] < 0)
				{
					for (j = 1; j <= MAX(i, NumChannels - i - 1); j++)
					{
						int ref = -1;

						if (i + j < NumChannels && ofs_mse[i + j] >= 0)
							ref = i + j; /* first look on the right */
						else if (i - j >= 0 && ofs_mse[i - j] >= 0)
							ref = i - j; /* next look on the left */
						if (ref >= 0)
						{
							happy_mse[i] = MAX(0, happy_mse[ref] + ofs_mse[i]);
							terminate_mse[i] = MAX(0, terminate_mse[ref] + ofs_mse[i]);
							ofs_mse[i] += ofs_mse[ref];
							break;
						}
					}
				}
			}
		}
		else
		{ /* everything is relative */
			for (i = 0; i < NumChannels; i++)
			{
				happy_mse[i] = cur_mse[i];
				terminate_mse[i] = 0; /* and ofs stays as it is */
			}
		}
	}

	/* first go through all the channels to see whether they are already happy */
	happy_count = 0;
	for (i = 0; i < NumChannels; i++)
	{
		am_i_happy[i] = cur_mse[i] <= terminate_mse[i];
		happy_count += am_i_happy[i];
//		printf("%i. terminate_mse: %i (%f) happy_mse: %i (%f) ofs_mse: %i (%f)\n", i, terminate_mse[i], terminate_mse[i] / fp_factor, happy_mse[i], happy_mse[i] / fp_factor, ofs_mse[i], ofs_mse[i] / fp_factor);
	}

	left_bits = initial_bits = (MaxBits > 0) ? MaxBits : 1 << 30;
	while (happy_count < NumChannels && left_bits > channel_bits && entries_left > 0)
	{
		wv_fixed_point worst_mse = (wv_fixed_point)-1 << (sizeof worst_mse * 8 - 1);
		int wi = -1; /* worst index */

		/* find the currently worst channel */
		for (i = 0; i < NumChannels; i++)
			if (!am_i_happy[i] && cur_mse[i] - ofs_mse[i] >= worst_mse && next_entry[i] < Channel[i]->num_schedule_entries && (min_bits_needed == 0 || min_bits[i] > 0 || Channel[i]->schedule[next_entry[i]].count * channel_bits + Channel[i]->schedule[next_entry[i]].num_bits + min_bits_needed <= left_bits))
			{ /* make sure we can still satisfy min_bits_needed if this one doesn't reduce it */
				worst_mse = cur_mse[i] - ofs_mse[i];
				wi = i;
			}

		if (wi >= 0)
		{
			wv_fixed_point improvement;
			const t_wv_schedule_entry *se = &Channel[wi]->schedule[next_entry[wi]];
			int entry_overhead = channel_bits * se->count; /* that many channel indices */
			int bits_allocated;

			improvement = ((se->improvement - 1) >> right_shift[wi]) + 1; /* convert to common scale */

			if (left_bits >= se->num_bits + entry_overhead)
			{ /* we can fit the next block */
				bits_allocated = se->num_bits + entry_overhead;
				left_bits -= bits_allocated;
				if (min_bits[wi] > 0)
				{ /* we made sure min_bits always contain whole blocks, so we always have enough space for the whole block */
					min_bits[wi] -= se->num_bits;
					min_bits_needed -= bits_allocated;
				}
			}
			else
			{ /* we write a partial block (and there must be bits left) */
				improvement = ((improvement / fp_factor) * ((double)(left_bits - entry_overhead) / se->num_bits)) * fp_factor;
				bits_allocated = left_bits;
				left_bits = 0;
			}
			cur_mse[wi] -= improvement;

			if (cb)
				cb(wi, Channel[wi], se, cur_mse[wi] / fp_factor, bits_allocated, userdata);

			next_entry[wi]++;
			entries_left--;

			/* update whether this channel it is happy now */
			am_i_happy[wi] = cur_mse[wi] <= terminate_mse[wi];
			happy_count += am_i_happy[wi];
		}
	}

	for (i = 0; i < NumChannels; i++)
		if (!am_i_happy[i] && cur_mse[i] <= happy_mse[i])
			happy_count++; /* we would have been happy no matter what */

	wv_ASSERT(min_bits_needed == 0, "Could not satisfy MinBits criterion");
	return (happy_count == NumChannels) ? initial_bits - left_bits : -1;
}


typedef struct
{
	double mse[wv_MAX_CHANNELS + 1];
	int bits[wv_MAX_CHANNELS + 1];
	t_bit_file* bf;
	unsigned int *refinement_cache; /* needs to provide space for the (biggest block >> 5) + 1 entries */
	unsigned char channel_bits;
	unsigned char block_bits;
} t_write_userdata;

static void scheduler_write_callback(int ci, const t_wv_channel *cc, const t_wv_schedule_entry* se, double mse, int bits_allocated, void* userdata)
{
	t_write_userdata *ud = (t_write_userdata *)userdata;

	ud->mse[ci] = mse; /* store mse */
	ud->bits[ci] += bits_allocated;
	if (ud->bf)
	{
		const t_wv_block *bl = &cc->block[se->b];
		unsigned char j;

		for (j = 0; j < se->count; j++)
		{ /* go through agglomerated bit-planes */
			if (ud->channel_bits > 0)
				bit_write(ci, ud->channel_bits, ud->bf);
			if (ud->block_bits > 0)
				bit_write(se->b, ud->block_bits, ud->bf);
			if (se->b == 0)
				bit_write(1, 1, ud->bf); /* indicate we're still writing */
			rice_encode_plane(cc->reordered_channel + bl->offset, bl->size, 0,
				bl->num_planes - (se->p + j) - 1, ud->refinement_cache, ud->bf);
		}
	}
//	printf("channel: %i block: %i plane: %i count: %i size: %i mse: %f\n", ci, se->b, se->p, se->count, se->num_bits, mse);
}


/** Returns the size of the header in bits (for the given channels).
 * @param[in] NumChannels # of channels in Channel.
 * @param[in] Channel NumChannels pointers to t_wv_channel, which actually
 *   treated as const (so they are used read-only). Not declared as const due
 *    http://c-faq.com/ansi/constmismatch.html. Need identical dimensions.
 * @return Number of bits used for the header (or -1 on failure).
**/
int wv_query_header(const int NumChannels, t_wv_channel* Channel[])
{
	int bits = -1;

	if (NumChannels > 0 && NumChannels <= wv_MAX_CHANNELS && Channel)
	{
		int i;
		unsigned char min_pel_store;

		/* check that the channels are the same size & # of blocks */
		min_pel_store = Channel[0]->min_pel_store;
		for (i = 1; i < NumChannels; i++)
		{
			min_pel_store = MAX(min_pel_store, Channel[i]->min_pel_store);
			if (Channel[0]->width != Channel[i]->width || Channel[0]->height != Channel[i]->height)
				return -1;
		}

		bits = 32 + 16 + 16 + wv_MAX_CHANNEL_STORE + wv_MAX_BLOCK_STORE + wv_PEL_STORE_BITS; /* magic + width + height + channels + blocks + pel */
		bits += NumChannels * (8 + Channel[0]->num_blocks * (min_pel_store + 3));
	}
	return bits;
}


/** Queries the scheduler what sort of mean-square error would be achieved.
 * @param[in] NumChannels # of channels in Channel.
 * @param[in] Channel NumChannels pointers to t_wv_channel, which actually
 *   treated as const (so they are used read-only). Not declared as const due
 *    http://c-faq.com/ansi/constmismatch.html. Need identical dimensions.
 * @param[in] TargetMSE NumChannels doubles that prescribe the target mean-
 *   square for each given channel. If the value is non-negative then it is
 *   an absolute value which is strictly required (i.e. the method will return
 *   with an error-code if it cannot be achieved), if it is negative then it
 *   is a relative value that is resolved (if possible) with its closest
 *   absolute neighbour, first checked on the right, then on the left side.
 * @param[in] MaxBits Bits allocated to compressed data, 0 ^= infinite bits.
 * @param[in] MinBits Minimum number of bits spent on each channel. These is
 *   interpreted as a hard constraint, and failure to satisfy them results in
 *   failure, unless the channel had fewer bits to start with. This number
 *   will be rounded up depending on the granularity of the blocks.
 * @param[out] EstimatedMSE Estimated Mean-square error achieved for each channel.
 * @param[out] EstimatedBits Estimated bits used for each channel.
 * @return Number of bits used (or -1 on failure to satisfy constraints).
**/
int wv_query_scheduler(const int NumChannels, t_wv_channel* Channel[], const double TargetMSE[], const int MaxBits, const int MinBits[], double EstimatedMSE[], int EstimatedBits[])
{
	int bits = -1;
	int i, headersize, bits_available = MaxBits;
	t_write_userdata ud;

	/* verify parameters */
	if (NumChannels <= 0 || !Channel)
		return -1;

	headersize = wv_query_header(NumChannels, Channel);
	if (headersize <= 0)
		return headersize;

	if (bits_available > 0)
	{
		bits_available -= headersize;
		if (bits_available <= 0)
			return -1; /* couldn't even write header */
	}

	/* make an (over-estimated) guess about the initial error of each channel */
	for (i = 0; i < NumChannels; i++)
	{
		ud.mse[i] = pow(2.0, sizeof (wv_fixed_point) * 8 - 2 - Channel[i]->schedule_fractional_bits) - 1.0;
		ud.bits[i] = 0;
	}

	ud.bf = NULL;
	ud.refinement_cache = NULL;
	ud.channel_bits = wv_log2i(NumChannels - 1);
	ud.block_bits = wv_log2i(Channel[0]->num_blocks - 1);
	bits = interleave_schedules(NumChannels, Channel, TargetMSE, bits_available, MinBits, scheduler_write_callback, &ud);
	for (i = 0; i < NumChannels; i++)
	{
		EstimatedMSE[i] = ud.mse[i];
		EstimatedBits[i] = ud.bits[i];
	}
	if (bits > 0)
		bits += headersize;
	return bits;
}


/** Encodes a number of channels progressively interleaved into a given file
 * which can later be read by wv_decode(). As each channel has to have the same
 * dimensions, they also have the same amount of blocks. The blocks are ordered
 * by the scheduler to always give the best quality improvement per bit spent.
 * @param[in] NumChannels # of channels in Channel.
 * @param[in] Channel NumChannels pointers to t_wv_channel, which actually
 *   treated as const (so they are used read-only). Not declared as const due
 *    http://c-faq.com/ansi/constmismatch.html. Need identical dimensions.
 * @param[in] TargetMSE NumChannels doubles that prescribe the target mean-
 *   square for each given channel. If the value is non-negative then it is
 *   an absolute value which is strictly required (i.e. the method will return
 *   with an error-code if it cannot be achieved), if it is negative then it
 *   is a relative value that is resolved (if possible) with its closest
 *   absolute neighbour, first checked on the right, then on the left side.
 * @param[in] MaxBits Bits allocated for data (including the header!),
 *   0 ^= infinite bits.
 * @param[in] MinBits Minimum number of bits spent on each channel. These is
 *   interpreted as a hard constraint, and failure to satisfy them results in
 *   failure, unless the channel had fewer bits to start with. This number
 *   will be rounded up depending on the granularity of the blocks.
 * @param[in] BF Bit-file handle to be written to, must have been opened with
 *   the correct number of bits (otherwise the file will be too big).
 * @return Number of bits written (or -1 on failure to satisfy constraints).
**/
int wv_encode(const int NumChannels, t_wv_channel* Channel[], const double TargetMSE[], const int MaxBits, const int MinBits[], t_bit_file* BF)
{
	int bits = -1;

	if (NumChannels > 0 && NumChannels <= wv_MAX_CHANNELS && Channel && BF)
	{
		int i, j;
		t_write_userdata ud;
		unsigned char min_pel_store;

		/* check that the channels are the same size & # of blocks */
		min_pel_store = Channel[0]->min_pel_store;
		for (i = 1; i < NumChannels; i++)
		{
			min_pel_store = MAX(min_pel_store, Channel[i]->min_pel_store);
			if (Channel[0]->width != Channel[i]->width || Channel[0]->height != Channel[i]->height)
				return -1;
		}

		bits = 0;
		bits += bit_write(wv_magic, 32, BF);
		bits += bit_write(Channel[0]->width - 1, 16, BF);
		bits += bit_write(Channel[0]->height - 1, 16, BF);
		bits += bit_write(NumChannels - 1, wv_MAX_CHANNEL_STORE, BF);
		bits += bit_write((Channel[0]->num_blocks - 1) / 3, wv_MAX_BLOCK_STORE, BF);
		bits += bit_write(min_pel_store, wv_PEL_STORE_BITS, BF);

		/* write retained user-data for all channels as well as # of bitplanes */
		for (j = 0; j < NumChannels; j++)
		{
			bits += bit_write(Channel[j]->data_format, 8, BF);
			for (i = 0; i < Channel[0]->num_blocks; i++)
				bits += bit_write(Channel[j]->block[i].num_planes, min_pel_store + 3, BF);
		}

		if (MaxBits > 0 && MaxBits - bits <= 0)
			return -1; /* not enough space for data */

		ud.bf = BF;
		/* allocate refinement cache that is big enough for the largest block */
		ud.refinement_cache = malloc(((Channel[0]->block[Channel[0]->num_blocks - 1].size >> 5) + 1) * sizeof *ud.refinement_cache);
		ud.channel_bits = wv_log2i(NumChannels - 1);
		ud.block_bits = wv_log2i(Channel[0]->num_blocks - 1);
		j = interleave_schedules(NumChannels, Channel, TargetMSE, MaxBits - bits, MinBits, scheduler_write_callback, &ud);
		if (j < 0)
			bits = j; /* gotten an error */
		else
			bits += j;
		free(ud.refinement_cache);
	}
	return bits;
}


/** Reads the header written by wv_encode().
 * @param[in] Header Store the header information here (only valid if non-NULL
 *   was returned).
 * @param[in] BF Bit-file handle to try to read header from.
 * @return Header on success or NULL on failure to find a valid header.
**/
t_wv_header* wv_read_header(t_wv_header* Header, t_bit_file* BF)
{
	if (!Header)
		return NULL;

	memset(Header, 0, sizeof *Header);

	if (!BF)
		return NULL;

	if (bit_read(32, BF) != wv_magic)
		return NULL;

	Header->width = 1 + bit_read(16, BF);
	Header->height = 1 + bit_read(16, BF);
	Header->num_channels = 1 + bit_read(wv_MAX_CHANNEL_STORE, BF);
	Header->num_blocks = 1 + 3 * bit_read(wv_MAX_BLOCK_STORE, BF);
	Header->min_pel_store = bit_read(wv_PEL_STORE_BITS, BF);
	wv_ASSERT(sizeof(wv_pel) >= 1U << Header->min_pel_store, "Too many bitplanes in file: Increase range of wv_pel");

	return Header;
}


/** Decodes channels written by wv_encode() to a file back to a t_wv_channel.
 * The channels returned by this function can be written again by wv_encode(),
 * but the bit-estimate will be incorrect and actual output will differ. Also,
 * no estimated errors are available. Thus using wv_channel_set_max_mse() or
 * the relative equivalent on these channels is not recommended.
 * @param[in] Header If the header was already read manually, a pointer to it,
 *   if NULL, the header is read from the file BF.
 * @param[in] Reduction Indicates how often to halve the dimensions of the
 *   stored channels (until the smaller of width and height is 2). Although
 *   this is a very natural thing to do with wavelet coding, the completely
 *   embedded structure of the files does not allow this to be of the highest
 *   quality in the file as we have to stop as soon as the first bitplane of a
 *   higher-resolution block is encountered.
 * @param[out] Channel Array receiving the returned number of t_wv_channel*
 *   pointers.
 * @param[in] BF Bit-file handle to read from. If Header != NULL, it has to be
 *   aligned directly after the header, otherwise on the header.
 * @return Number of channels decoded, 0 on failure.
**/
int wv_decode(const t_wv_header* Header, int Reduction, t_wv_channel* Channel[wv_MAX_CHANNELS + 1], t_bit_file* BF)
{
	t_wv_header hdr;

	if (!Header)
		Header = wv_read_header(&hdr, BF); /* not given, so try to read it */

	if (Channel && Header)
	{ /* looks like we're good */
		t_wv_channel *cc = NULL;
		t_wv_schedule_entry* last_block[wv_MAX_CHANNELS + 1]; /* the last time we wrote a block of that channel */
		int width2 = 1 << wv_log2i(Header->width - 1); /* next bigger pow2 */
		int height2 = 1 << wv_log2i(Header->height - 1); /* next bigger pow2 */
		int num_levels = MAX(0, wv_log2i(MIN(width2, height2)) - 2);
		int size, total_planes;
		int i, j;
		int rlr_bits_left[wv_MAX_CHANNELS + 1][wv_MAX_BLOCKS + 1];
		unsigned char planes_left[wv_MAX_CHANNELS + 1][wv_MAX_BLOCKS + 1];
		unsigned char block_bits, channel_bits;

		if (num_levels != (Header->num_blocks - 1) / 3)
			return 0;

		Reduction = MAX(0, MIN(Reduction, num_levels));
		total_planes = 0;
		for (j = 0; j < Header->num_channels; j++)
		{ /* initialize all the channels to something sensible */
			Channel[j] = malloc(sizeof *Channel[j]);
			cc = Channel[j];
			cc->width = (Header->width + (1 << Reduction) - 1) >> Reduction;
			cc->height = (Header->height + (1 << Reduction) - 1) >> Reduction;
			cc->num_schedule_entries = 0;
			cc->num_blocks = Header->num_blocks - Reduction * 3;
			cc->schedule_fractional_bits = 0;
			cc->data_format = bit_read(8, BF);
			cc->min_pel_store = Header->min_pel_store;
			cc->block = malloc(cc->num_blocks * sizeof *cc->block);

			size = 0;
			for (i = 0; i < cc->num_blocks; i++)
			{
				int shift = Reduction + (cc->num_blocks - 1) / 3 - MAX(0, i - 1) / 3; /* shift is the same for i == 0 and 1-3 */
				int used_width, used_height;

				if (MIN(width2, height2) <= 2)
					shift = 0; /* only the untransformed data written as average */
				get_block_dimensions(width2, Header->width, height2, Header->height, shift, &used_width, &used_height);
				cc->block[i].offset = (i == 0) ? 0 : cc->block[i - 1].offset + cc->block[i - 1].size;
				cc->block[i].size = used_width * used_height;
				size += cc->block[i].size;
				rlr_bits_left[j][i] = cc->block[i].size; /* initially all coefficients are run-length coded */

				cc->block[i].num_planes = bit_read(Header->min_pel_store + 3, BF);
				cc->block[i].plane = NULL;
				planes_left[j][i] = cc->block[i].num_planes;
				cc->num_schedule_entries += cc->block[i].num_planes; /* not agglomerated */
			}
			bit_read((Header->min_pel_store + 3) * 3 * Reduction, BF); /* skip # bit-planes for blocks that were "reduced away" */

			total_planes += cc->num_schedule_entries;
			cc->schedule = malloc(cc->num_schedule_entries * sizeof *cc->schedule);
			cc->num_schedule_entries = 0; /* fill this out more accurately when reading */
			cc->reordered_channel = malloc(size * sizeof *cc->reordered_channel);
			memset(cc->reordered_channel, 0, size * sizeof *cc->reordered_channel);
			last_block[j] = NULL; /* for computing improvements */
		}

		num_levels -= Reduction;
		width2 >>= Reduction;
		height2 >>= Reduction;

		block_bits = wv_log2i(Header->num_blocks - 1); /* bits need for coding a block-# */
		channel_bits = wv_log2i(Header->num_channels - 1); /* bits need for coding a channel-# */

		while (total_planes > 0)
		{
			t_wv_schedule_entry *se;
			int nc;
			unsigned char nb;

			nc = bit_read(channel_bits, BF);
			nb = bit_read(block_bits, BF);
			/* stop if at least one of the indices is incorrent (either corrupted or absent data) */
			if (nc >= Header->num_channels || nb >= Channel[0]->num_blocks || planes_left[nc][nb] == 0)
				break;

			/* for block 0, we've written a single 1-bit to indicate valid data */
			if (nb == 0 && bit_read_single(BF) == 0)
				break;

			/* now we need a bitplane for nc's nb */
			cc = Channel[nc];

			/* first, fill out the schedule entry */
			se = cc->schedule + cc->num_schedule_entries;
			cc->num_schedule_entries++;
			se->num_bits = block_bits + 1; /* some value */
			se->improvement = total_planes;
			se->count = 1;
			se->b = nb;
			se->p = cc->block[nb].num_planes - planes_left[nc][nb]; /* as 0 ^= highest plane */
			if (last_block[nc])
				last_block[nc]->improvement -= total_planes; /* update the last block's improvements so that this block happens in the right place */
			last_block[nc] = se;

			planes_left[nc][nb]--;
			rlr_bits_left[nc][nb] = rice_decode_plane(cc->reordered_channel + cc->block[nb].offset,
				cc->block[nb].size, 0, planes_left[nc][nb], rlr_bits_left[nc][nb], BF);
			total_planes--;
		}

		/* now finalize the bitplanes (i.e. dequant and properly set sign) */
		for (j = 0; j < Header->num_channels; j++)
		{
			cc = Channel[j];
			for (i = 0; i < cc->num_blocks; i++)
				rice_decode_finalize(cc->reordered_channel + cc->block[i].offset,
					cc->block[i].size, planes_left[j][i]);
		}

		return Header->num_channels;
	}
	return 0;
}

/* reorder.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/


/** \page reorder Reordering Process

\section reorderwhy Why would you?
In order to actually compress an image, we need to take advantage of redundancy
in the data. A very simple way to do this without a complicated prediction
mechanism, is run-length coding. In fact, we only run-length code subsequent 0s.
We still want to take advantage of the original 2D nature of the data, which
implies that coefficients close to each other have a higher probability of being
similar, but without the complication of a 2D context-model like quad- or zero-
trees.

Another reason for this reordering is that we can preserve the progressive
nature of wavelet coding by first writing the important coefficients and then
the less and less important ones (which we call the levels of the
decomposition).

\section reorderhow How do we reorder?
Then the coefficients are "reordered" into a one-dimensional array by \ref
transform "octave-bands" (and using a Hilbert space-filling curve within each
octave, first is a block of coefficients with HD (horizontal detail), followed
by a block of VD (vertical detail) coefficients, and finally a block the DD
(diagonal detail) coefficients.

Now, we are able to write the coefficients of each block to disk via
\ref coding.

\section reorderref References
- http://en.wikipedia.org/wiki/Space-filling_curve
- http://swissnet.ai.mit.edu/~jaffer/Geometry/HSFC.html
- John J. Bartholdi, III and Paul Goldsman, "Vertex-Labeling Algorithms for the
  Hilbert Spacefilling Curve", Software - Practice and Experience, 2001, Volume
  31-5
- Dr. Volker Markl, Frank Ramsak, "Universalschlüssel - Datenbankindexe in
  mehreren Dimensionen", 2001, c't 01/2001 - Magazin für Computertechnik
- Henrique Malvar, "Progressive Wavelet Coding of Images", 1999,
  IEEE Data Compression Conference March 1999

**/


#include <stdlib.h>


/** Converts a Hilbert Curve derived key into the coordinates of a point.
 * This implementation is based on the paper "Vertex-Labeling Algorithms for the
 * Hilbert Spacefilling Curve" by John J. Bartholdi, III and Paul Goldsman. I
 * have combined the x and y coordinate into a single unsigned int. Also, due to
 * fixed point arithmetic, the coordinates of the points grow with each
 * iteration so that sufficient accuracy is maintained.
 * @param[in] key Hilbert derived key to be converted. In essence, we interpret
 *   every two bits of the key as a certain spatial subdivision of the
 *   unit-square, as described in the paper.
 * @param[in] numbits number of bits in the key (which has to be even). Only
 *   valid for even values smaller than or equal to 30 (as otherwise the point
 *   coordinates overflow or the key does not provide enough bits).
 * @param[in] a, b, c, d coordinates (combined x and y) for vertex labelling.
 * @return fixed point coordinates with a precision of numbits / 2 (combined).
**/
static unsigned int recurse_hilbert(unsigned int key, unsigned int numbits,
	unsigned int a, unsigned int b, unsigned int c, unsigned int d)
{
	if (numbits > 1)
	{
		unsigned char subcell;

		numbits -= 2; /* each subdivision decision takes 2 bits */
		subcell = (key >> numbits) & 3; /* namely these two (taken from the top) */

		switch (subcell)
		{ /* divide into subcells */
			case 0:
				return recurse_hilbert(key, numbits, a << 1, a + d, a + c, a + b);
			case 1:
				return recurse_hilbert(key, numbits, b + a, b << 1, b + c, b + d);
			case 2:
				return recurse_hilbert(key, numbits, c + a, c + b, c << 1, c + d);
			case 3:
				return recurse_hilbert(key, numbits, d + c, d + b, d + a, d << 1);
		}
	}
	else
	{ /* final result is the midpoint of the cell, i.e. (a + b + c + d) / 4 */
		unsigned int outx, outy;

		/* extract x from the lower 16 bits */
		outx = ((a & 0xffff) + (b & 0xffff) + (c & 0xffff) + (d & 0xffff) + 1) >> 2;
		/* extract y from the upper 16 bits */
		outy = ((a >> 16) + (b >> 16) + (c >> 16) + (d >> 16) + 1) >> 2;
		return (outy << 16) | outx;
	}
	return 0; /* make gcc happy */
}


/** \internal Computes the extents of actually used wavelet coefficients in a
 * block.
 * @param[in] width2 Next-bigger power of 2 to width.
 * @param[in] width Width of the input-image.
 * @param[in] height2 Next-bigger power of 2 to height.
 * @param[in] height Height of the input-image.
 * @param[in] shift Decomposition level.
 * @param[out] block_width Width of the used coefficients at requested level.
 * @param[out] block_height Height of the used coefficients at requested level.
**/
void get_block_dimensions(int width2, int width, int height2, int height, int shift, int *block_width, int *block_height)
{
	int w, h; /* width & height of the block / level */
	int uw, uh; /* unused width & height in block / level */

	/* size of the decomposition level */
	w = width2 >> shift;
	h = height2 >> shift;
	uw = (width2 - width) >> shift;
	uh = (height2 - height) >> shift;

	*block_width = MIN(w, w - uw + 1);
	*block_height = MIN(h, h - uh + 1);
}


static void construct_reorder_table(unsigned int *dst, const int size)
{
	int bits = wv_log2i(size) << 1; /* times 2 is squared in non-log */
	int i;

	for (i = 0; i < size * size; i++)
		*dst++ = recurse_hilbert(i, bits, 0x00000, 0x00001, 0x10001, 0x10000);
}


/** Creates a reorder-table that is used to permute 2D wavelet coefficients into
 * a 1D array. This reordering is based on the Hilbert space-filling curve.
 * Tables can be reused between images as long as it has been created for the
 * largest dimension. Only the smallest given dimension is actually relevant.
 * See \ref reorder for more information.
 * @param[in] MaxImageWidth Largest width of an image to be processed with by
 *   table.
 * @param[in] MaxImageHeight Largest height of an image to be processed with by
 *   table.
 * @return Reorder table, freed by wv_free_reorder_table().
**/
t_reorder_table* wv_create_reorder_table(const int MaxImageWidth, const int MaxImageHeight)
{
	t_reorder_table* t = NULL;
	unsigned int *all_the_ints = NULL;
	int size, levels, i;

	size = MIN(MaxImageWidth, MaxImageHeight);
	size = (size >= 2 && size < 4) ? 2 : (size + 1) >> 1; /* if no wavelet transform, biggest block has half dimensions of input image */
	size = 1 << wv_log2i(size - 1); /* Next bigger power of 2 */
	levels = wv_log2i(size) - 1; /* # of tables (can be < 0)*/

	t = malloc(sizeof *t);
	t->num_levels = MAX(0, levels);
	if (levels > 0)
	{ /* Sum[(2^i)^2, {i, n}] = 4/3 * (-1 + 2^n) * (1 + 2^n) */
		int num_entries = (4 * (-1 + (1 << levels)) * (1 + (1 << levels))) / 3;

		all_the_ints = malloc(num_entries * sizeof *all_the_ints);
	}
	for (i = 0; i < levels; i++)
	{
		int dimension = 2 << i; /* smallest table is 2 x 2 */

		t->level[i] = all_the_ints;
		construct_reorder_table(t->level[i], dimension);
		all_the_ints += dimension * dimension;
	}
	for (; i < wv_NUM_REORDER_LEVELS; i++)
		t->level[i] = NULL;
	return t;
}


/** Frees a reorder-table allocated with wv_create_reorder_table().
 * @param[in] Table Table to be freed.
**/
void wv_free_reorder_table(t_reorder_table *Table)
{
	if (Table)
	{
		if (Table->num_levels > 0)
			free(Table->level[0]); /* frees all levels, as the come from the same alloc */
		free(Table);
	}
}


static unsigned int reorder_block(wv_pel *dst, wv_pel* src, const unsigned int *idx, const unsigned int num, const int width2, const unsigned int coefficient_pitch, const unsigned int xlimit, const unsigned int ylimit)
{
	unsigned int count = 0;
	unsigned int i;
	unsigned int x, y;

	for (i = 0; i < num; i++)
	{
		x = *idx & 0xffff;
		y = *idx >> 16;
		if (x < xlimit && y < ylimit)
		{
			*dst++ = src[(y * width2 + x) * coefficient_pitch];
			count++;
		}
		idx++;
	}
	return count;
}

static unsigned int unreorder_block(wv_pel *src, wv_pel* dst, const unsigned int *idx, const unsigned int num, const int width2, const unsigned int coefficient_pitch, const unsigned int xlimit, const unsigned int ylimit)
{
	unsigned int count = 0;
	unsigned int i;
	unsigned int x, y;

	for (i = 0; i < num; i++)
	{
		x = *idx & 0xffff;
		y = *idx >> 16;
		if (x < xlimit && y < ylimit)
		{
			dst[(y * width2 + x) * coefficient_pitch] = *src++;
			count++;
		}
		idx++;
	}
	return count;
}


static int reorder_level(t_reorder_function func, wv_pel *dst, wv_pel* src, const t_reorder_table* table, const int width, const int height, const unsigned int coefficient_pitch, const int level)
{
	int width2 = 1 << wv_log2i(width - 1); /* next bigger pow2 */
	int height2 = 1 << wv_log2i(height - 1); /* next bigger pow2 */
	int reorder_level = wv_log2i(MIN(width2, height2) >> level) - 2;
	int block_size = 2 << reorder_level; /* dimension of the used reorder table */
	int count = 0;
	int level_width, level_height;
	int i;

	wv_ASSERT(reorder_level < table->num_levels, "Not enough levels in reorder table: Initialise table for the proper dimensions");
	get_block_dimensions(width2, width, height2, height, level, &level_width, &level_height);

	/* we only have to loop in 1D (because if it were in both x & y, we'd just use a larger table) */
	if (width > height)
	{ /* loop horizontally */
		int bx, num_bx = level_width / block_size;
		unsigned int xlimit = level_width - num_bx * block_size;

		for (bx = 0; bx < num_bx; bx++)
		{
			i = func(dst, src + bx * block_size * coefficient_pitch, table->level[reorder_level], block_size * block_size, width2, coefficient_pitch, block_size, level_height);
			dst += i;
			count += i;
		}
		if (xlimit > 0)
		{
			i = func(dst, src + bx * block_size * coefficient_pitch, table->level[reorder_level], block_size * block_size, width2, coefficient_pitch, xlimit, level_height);
			dst += i;
			count += i;
		}
	}
	else
	{ /* loop vertically */
		int by, num_by = level_height / block_size;
		unsigned int ylimit = level_height - num_by * block_size;

		for (by = 0; by < num_by; by++)
		{
			i = func(dst, src + by * width2 * block_size * coefficient_pitch, table->level[reorder_level], block_size * block_size, width2, coefficient_pitch, level_width, block_size);
			dst += i;
			count += i;
		}
		if (ylimit > 0)
		{
			i = func(dst, src + by * width2 * block_size * coefficient_pitch, table->level[reorder_level], block_size * block_size, width2, coefficient_pitch, level_width, ylimit);
			dst += i;
			count += i;
		}
	}
	return count;
}


void reorder(wv_pel *dst, wv_pel* src, const t_reorder_table* table, const int width, const int height, const int unreorder)
{
	int min_dimension = MIN(width, height);
	int i;

	if (min_dimension < 2)
	{ /* we haven't done a transform, and one of the input dimensions is 1; so we do a straight copy */
		int num = MAX(width, height);

		for (i = 0; i < num; i++)
			*dst++ = src[i];
	}
	else
	{ /* we use the reorder table */
		t_reorder_function func = reorder_block; /* default to reorder */
		int num_levels = MAX(0, wv_log2i(min_dimension - 1) - 1);
		int width2 = 1 << wv_log2i(width - 1); /* next bigger pow2 */
		unsigned int pitch = 1 << num_levels; /* between wavelet coefficients of the same block */

		if (unreorder != 0)
		{ /* swap src and dst */
			wv_pel *tmp = src;

			src = dst;
			dst = tmp;
			func = unreorder_block;
		}

		/* output base average (or the pure data if too small for wavelet transform) */
		dst += reorder_level(func, dst, src, table, width, height, pitch, num_levels);
		for (i = num_levels; i > 0; i--)
		{
			unsigned int halfpitch = pitch >> 1;

			dst += reorder_level(func, dst, src + halfpitch, table, width, height, pitch, i); /* HD */
			dst += reorder_level(func, dst, src + halfpitch * width2, table, width, height, pitch, i); /* VD */
			dst += reorder_level(func, dst, src + halfpitch * (width2 + 1), table, width, height, pitch, i); /* DD */
			pitch = halfpitch; /* halve the pitch for each level we go up */
		}
	}
}
/* utility.c, version 3.4.0

  Copyright (C) 2001-2007 Daniel Vollmer

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Daniel Vollmer <[email protected]>

*/

#include <math.h>

/** Integer logarithm base 2 of a given integer.
 * This is equal to the number of bits needed to encode that number.
 * \warning Returns 0 for negative numbers!
 * @param[in] v number whose log is wanted
 * @return Integer logarithm (base 2) of v (wv_log2i(0) = 0)
**/
int wv_log2i(int v)
{
	int i;

	for (i = 0; v > 0; v >>= 1, i++) ;
	return i;
}

/** Converts MSE (mean square error) to PSNR (peak signal to noise ration (dB)).
 * Inverse of wv_psnr_to_mse().
 * @param[in] mse Mean square error
 * @return PSNR (in dB) of given MSE. The result is 0 (instead of "infinity")
 *   for non-positive values.
**/
double wv_mse_to_psnr(const double mse)
{
	if (mse > 0.0)
		return 20.0 * log10(255.0 / sqrt(mse));
	return 0.0;
}


/** Converts PSNR (peak signal to noise ration (dB)) to MSE (mean square error).
 * Inverse of wv_mse_to_psnr().
 * @param[in] psnr Peak signal to noise ration (in dB)
 * @return Mean square error of given PSNR. The result is 1e64 for non-positive
 *   values
**/
double wv_psnr_to_mse(const double psnr)
{
	if (psnr > 0.0)
		return pow(255.0 / pow(10.0, psnr / 20.0), 2.0);
	return 1e64;
}


/** Computes the PSNR (peak signal to noise ratio (dB)) and optionally the
 * mean-square error from two version of an image.
 * @param[in] a Image A
 * @param[in] b Image B
 * @param[in] width Width of the image in a and b
 * @param[in] height Height of the image in a and b
 * @param[in] pitch Pitch in wv_pel units of the image in a and b (usually this
 *   is a power of 2)
 * @param[out] pmse Address of a double where the mean-square error is written
 *   into (if != NULL)
 * @return PSNR between the two versions of the image
**/
double wv_calc_psnr(const wv_pel *a, const wv_pel *b, const int width, const int height, const int pitch, double *pmse)
{
	double sum2 = 0.0;
	int i, j;

	for (i = 0; i < height; i++)
		for (j = 0; j < width; j++)
		{
			double diff = (double)(a[i * pitch + j] - b[i * pitch + j]);

			sum2 += diff * diff;
		}

	if (sum2 > 0.0)
	{
		double mse = (double)(sum2 / (width * height)); // mean squared error

		if (pmse)
			*pmse = mse;
		return wv_mse_to_psnr(mse);
	}
	else
		*pmse = 0.0;
	return 0.0;
}


/* colour-space conversion from "YCoCg-R: A Color Space with RGB Reversibility
 * and Low Dynamic Range" by Henrique Malvar and Gary Sullivan. We implement
 * both the "normal" and the reversible variant. */


/** Converts an array of pixels (given as seperate bitplanes) from the RGB
 * color-space into the YCoCg color-space. Inverse of wv_ycocg_to_rgb().
 * @param[in] Num Number of pixels to be converted
 * @param[in,out] R Red pixels on input, Y pixels on output
 * @param[in,out] G Green pixels on input, Co pixels on output
 * @param[in,out] B Blue pixels on input, Cg pixels on output
 * @return Number of pixels converted
**/
int wv_rgb_to_ycocg(const int Num, wv_pel* R, wv_pel* G, wv_pel* B)
{
	if (Num > 0 && R && G && B)
	{
		int i;

		for (i = 0; i < Num; i++)
		{
			wv_pel rr = R[i], gg = G[i], bb = B[i];

			R[i] = (wv_pel)((rr + (gg << 1) + bb + 2) >> 2); // Y
			G[i] = (wv_pel)((rr - bb + 1) >> 1); // Co
			B[i] = (wv_pel)((-rr + (gg << 1) - bb + 2) >> 2); // Cg
		}
		return Num;
	}
	return 0;
}


/** Converts an array of pixels (given as seperate bitplanes) from the YCoCg
 * color-space into the RGB color-space. Inverse of wv_rgb_to_ycocg().
 * @param[in] Num Number of pixels to be converted
 * @param[in,out] Y Y pixels on input, red pixels on output
 * @param[in,out] Co Co pixels on input, green pixels on output
 * @param[in,out] Cg Cg pixels on input, blue pixels on output
 * @return Number of pixels converted
**/
int wv_ycocg_to_rgb(const int Num, wv_pel* Y, wv_pel* Co, wv_pel* Cg)
{
	if (Num > 0 && Y && Co && Cg)
	{
		int i;

		for (i = 0; i < Num; i++)
		{
			wv_pel yy = Y[i], co = Co[i], cg = Cg[i];
			wv_pel tmp;

			Co[i] = yy + cg;
			tmp = yy - cg;
			Y[i] = tmp + co;
			Cg[i] = tmp - co;
		}
		return Num;
	}
	return 0;
}


/** Converts an array of pixels (given as seperate bitplanes) from the RGB
 * color-space into the YCoCg-R color-space. Inverse of wv_ycocgr_to_rgb(). This
 * transform is reversible (i.e. lossless).
 * @param[in] Num Number of pixels to be converted
 * @param[in,out] R Red pixels on input, Y pixels on output
 * @param[in,out] G Green pixels on input, Co pixels on output
 * @param[in,out] B Blue pixels on input, Cg pixels on output
 * @return Number of pixels converted
**/
int wv_rgb_to_ycocgr(const int Num, wv_pel* R, wv_pel* G, wv_pel* B)
{
	if (Num > 0 && R && G && B)
	{
		int i;

		for (i = 0; i < Num; i++)
		{
			wv_pel rr = R[i], gg = G[i], bb = B[i];
			wv_pel tmp;

			G[i] = (wv_pel)(rr - bb); // Co
			tmp = bb + (G[i] >> 1);
			B[i] = (wv_pel)(gg - tmp); // Cg
			R[i] = (wv_pel)(tmp + (B[i] >> 1)); // Y

		}
		return Num;
	}
	return 0;
}


/** Converts an array of pixels (given as seperate bitplanes) from the YCoCg-R
 * color-space into the RGB color-space. Inverse of wv_rgb_to_ycocgr(). This
 * transform is reversible (i.e. lossless).
 * @param[in] Num Number of pixels to be converted
 * @param[in,out] Y Y pixels on input, red pixels on output
 * @param[in,out] Co Co pixels on input, green pixels on output
 * @param[in,out] Cg Cg pixels on input, blue pixels on output
 * @return Number of pixels converted
**/
int wv_ycocgr_to_rgb(const int Num, wv_pel* Y, wv_pel* Co, wv_pel* Cg)
{
	if (Num > 0 && Y && Co && Cg)
	{
		int i;

		for (i = 0; i < Num; i++)
		{
			wv_pel yy = Y[i], co = Co[i], cg = Cg[i];
			wv_pel tmp;

			tmp = yy - (cg >> 1);
			Co[i] = cg + tmp;
			Cg[i] = tmp - (co >> 1);
			Y[i] = Cg[i] + co;
		}
		return Num;
	}
	return 0;
}



EOHDR

pp_def('wvcomp',
	Pars=>'in(n,m); [o]opdl(k);',
	OtherPars=>'double bpp;',
	GenericTypes=>[S],
	PMCode=><<'EOFOO',

#foo

EOFOO
	Doc => 
	<< 'EOPMCODE',
# foo

 sub PDL::wvcomp {
	my $im = shift;
	my $bpp = shift;
	my $out = zeroes($im)->flat;
	PDL::_wvcomp_int($im,$out,$bpp);
	$olen = $out->at(0,0) + $out->at(1,0)*65536;
	return $out->slice("2:".($olen-1+2));
	}

EOPMCODE
	Code=> <<'EOCODE'
	t_reorder_table *rt;
	t_wv_channel *ch;
	t_bit_file *f;
	wv_pel *buf;
	char *out;
	int w, h, i,siz;

	w=$SIZE(n)-1;	for(i=0;w;i++) w=w>>1;	w = 1 << i;
	h=$SIZE(m)-1;	for(i=0;h;i++) h=h>>1;	h = 1 << i;

	printf("w=%d (from %d), h=%d (from %d)\n",w,$SIZE(n),h,$SIZE(m));

	buf = malloc(w * h * sizeof(wv_pel));
	for(i=0;i<w*h;i++)  buf[i] = 0;

	loop(n) %{
	  loop(m) %{
	    buf[n + m*w] = $in(); 
	  %}
	%}

	rt = wv_create_reorder_table( $SIZE(n), $SIZE(m) );
	ch = wv_channel_compact( $SIZE(n), $SIZE(m), 0, buf, 0, rt, 0, 0 );
	free(rt);
	
	out = malloc(w * h * sizeof(wv_pel));
	f = bit_open((void *)out, "wm", w * h * sizeof(wv_pel) * 8);
	
	{
	  double MSE = -1;
	  int minbits = 0;
	  siz = wv_encode(1, &ch, &MSE, $COMP(bpp) * $SIZE(n) * $SIZE(m), &minbits, f);
        }
	siz += 8-1;
	siz /= 8;

	$opdl(k=>0) = siz%65536;
	$opdl(k=>1) = siz/65536;
	i=0;
	{
	int i;
	char *ptr = &($opdl(k=>2));
	for(i=0;i<siz;i++) {
	  *ptr++ = out[i];
	 }
	 }
	
	//  loop(k) %{
	//   if( k > 1 && i < siz )
	//   $opdl() = out[i++];
	//%}
	{
		char *foo;
		bit_close(f,&foo);
	}
	wv_channel_free(ch);
	free(buf);

	
EOCODE
	);

pp_def('wv_dims',
	Pars=>'in(k); [o]dim(k);',
	GenericTypes=>[S],
	Code=><<'EOCODE'
	t_bit_file *f;
	t_wv_header hdr, *h;
	wv_pel *buf;
	int i=0;

	buf = &( $in(k=>0) );
printf("opening\n");
	f = bit_open((void *)buf, "rm", $SIZE(k) * sizeof(wv_pel) * 8);
printf("reading header\n");
	h = wv_read_header( &hdr, f );
printf("h is %d\n",h);
	  if(h) {
	   	$dim(k=>0) = h->width;
		$dim(k=>1) = h->height;
	  } else {
		$dim(k=>0) = 0;
		$dim(k=>1) = 0;
	  }
	{
		char *foo;
		printf("closing\n");
		bit_close(f,&foo);
	}
EOCODE
	);

pp_def('wvexp',
	Pars=>'in(k); [o]out(n,m);',
	GenericTypes=>[S],
	Doc=><<'EOD',

=head2 foo

=cut

EOD

	PMCode=><<'EOPMCODE',

sub PDL::wvexp {
    my $compressed = shift;
    my $foo = $compressed->copy;
    PDL::wv_dims($compressed, $foo);
    my $w = $foo->flat->at(0);
    my $h = $foo->flat->at(1);
    print "w=$w; h=$h\n";
    my $out = short zeroes($w, $h);
    PDL::_wvexp_int($compressed, $out);
    return $out;
}

EOPMCODE
	Code=><<'EOCODE'
	t_bit_file *f;
	t_wv_header hdr;
	t_wv_channel *chans[wv_MAX_CHANNELS + 1];;
	t_reorder_table *rt;
	wv_pel *buf;
	int i,j,w,h;
	
	buf = &( $in(k=>0) );

	f = bit_open((void *)buf, "rm", $SIZE(k) * sizeof(wv_pel) * 8);
	j = wv_decode( 0, 0, chans, f );
	
	rt = wv_create_reorder_table($SIZE(n), $SIZE(m));

	w=$SIZE(n)-1;	for(i=0;w;i++) w=w>>1;	w = 1 << i;
	h=$SIZE(m)-1;	for(i=0;h;i++) h=h>>1;	h = 1 << i;
	buf = malloc(w * h * sizeof(wv_pel));

	wv_channel_uncompact(chans[0], rt, buf);
	free(chans[0]);
	free(rt);
	{
		char *foo;
		bit_close(f,&foo);
	}
	
	loop(n) %{
	 loop(m) %{
   	   $out() = buf[ n  +  m * w ];
	  %}
	%}
	free(buf);
	
EOCODE
	);
	
EOF

sub PDL::wvcomp {
    my $im = shift;
    my $bpp = shift;
    $out = zeroes($im)->flat;
    PDL::_wvcomp_int($im,$out,$bpp);
    $olen = $out->at(0,0) + $out->at(1,0)*65536;
    return $out->slice("2:".($olen + 2 - 1))->copy;
}

sub PDL::wvexp {
    my $compressed = shift;
    my $foo = $compressed->copy;
    PDL::wv_dims($compressed, $foo);
    my $w = $foo->flat->at(0);
    my $h = $foo->flat->at(1);
    print "w=$w; h=$h\n";
    my $out = short zeroes($w, $h);
    PDL::_wvexp_int($compressed, $out);
    return $out;
}

1;




On Oct 27, 2015, at 11:40 AM, Craig DeForest <[email protected]> wrote:

The FFTW3 library has DCTs in it, but the current PDL module PDL::FFTW3 does not bind to them.   You could add the bindings yourself (with either my or Dima’s help if you want), or just use the FFT itself.  The latter is simple since you can just Fourier transform a padded version of your data with the correct boundary conditions.

If you want wavelet stuff, I have a binding to Daniel Vollmer’s wavelet compression library that I could send you.  That has a wavelet transform in it, and you can choose
your mother wavelet (I think it implements Haar and Morlet mother wavelets).  I used the library for some stuff, but never got around to making it publication quality.

Cheers,
Craig





http://www.fftw.org/doc/1d-Real_002deven-DFTs-_0028DCTs_0029.html


On Oct 27, 2015, at 10:37 AM, Ingo Schmid <[email protected]> wrote:

Hi,

I'd like to experiment with either discrete cosine transforms or wavelet transforms. Are there any implementations in PDL?

Ingo
------------------------------------------------------------------------------
_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general


------------------------------------------------------------------------------
_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

Reply via email to