On 02/17/2015 06:22 AM, janhein.vanderb...@gmail.com wrote:
In http://optarbvalintenc.blogspot.nl/ I propose a new way to encode 
arbitrarily valued integers and the python code that can be used as a reference 
for practical implementations of codecs.

The encoding algorithm itself is optimized for transmission and storage 
requirements without any processor load consideration whatsoever.

The next step is the development of the python code that minimizes processor 
requirements without compromising the algorithm.

Is this the right forum to ask for help with this step or to obtain references 
to more suitable platforms?


This is a fine forum for such a discussion. I for one would love to participate. However, note that it isn't necessary true that "the smaller the better" is a good algorithm. In context, there are frequently a number of tradeoffs, even ignoring processor time (as you imply).

Many years ago I worked on some code that manipulated the intermediate files for a compiler. I had no say in the format, I just had to deal with it.

They had a field type called a "compressed integer." It could vary between one byte and I think about six. And the theory was that it took less space than the equivalent format of fixed size integers. The catch from my point was that these integers could appear in the middle of a struct, and thus access to the later fields of the struct required a dynamic calculation. This put a huge onus on my code to read or write the data serially. I ended up solving it by writing code that generated 40 thousand lines of C++ header and body code, so that the rest of the code didn't care.

Was it worth it? To reduce the size of some files that only lived a few seconds on disk? I seriously doubt it. But I learned a lot in the process.

On another project, the goal was to be able to recover data from archives in spite of physical damage to some files. So I had to add selective redundancy for that. In the process, I also compress the data, but confine the compression algorithm to relatively small pieces of data, and label those pieces independently, so that any single decompression error affects only one unit of data.

So going back to your problem, and assuming that the other issues are moot, what's your proposal? Are you compressing relative to a straight binary form of storage? Are you assuming anything about the relative likelihood of various values of integers? Do you provide anything to allow for the possibility that your prediction for probability distribution isn't met?

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to