It should be possible to implement Huffman encoding in such a manner that only the Huffman tree is permanently in memory. But you need to read the input file twice: you cannot encode the first character of your input until the entire tree has been constructed. And to construct the tree you need the frequencies of characters, so you need to scan the entire input before you can start encoding. But as you noticed, that only buys you 1 MB, so that must be about the size of the representation of the input list. Therefore, your space eater must be one of 1. the freq_arry or its construction 2. the Huffman tree itself (unlikely: it has only 256 nodes and each entry holds an Int) 3. some problem with the construction of the output list, where you can loose quite a lot, eg. do you produce any output before the entire input has been encoded, and how do you implement the necessary bit-fiddling? You can find out by inserting cost centres (if you use ghc) at the suspicious places. In the documentation you also find hints on improving the space/time behavior of your programs, eg. is any of your functions overloaded? Finally, I would like to mention that the LZW-algorithm is somewhat better suited to be implemented in Haskell, since it produces output immediately. I have recently written a version of it which runs (well: walks) inside of 200k heap to encode a 53k file. It makes, however, use of ghc's imperative features (mutable arrays) in an essential way. There is also a version in standard Haskell (originally due to Sanders and Runciman and tuned by Will Partain) in the nofib benchmark suite, which performs remarkably well. I wasn't going to release my code, but if you are interested you can find it (including a comparison with some other implementations) through my web page at URL http://www-pu.informatik.uni-tuebingen.de/~thiemann/haskell -Peter [mentioning only ghc just means I haven't used another Haskell compiler recently.]