On 23 October 2017 at 15:29, <danceswithnumb...@gmail.com> wrote: > I'm really not trolling, and even though some are sarcastic i sm learning > from your comments.
I'm willing to believe that, but if you're trying to claim you have "compressed" data (in a way that satisfies the technical, information-theoretic meaning of the word), you need to be *really* careful not to include hidden information in your assumptions. For example, if I have a string made up of only the numbers 0-7, then I can trivially (octal) store that data in 3 bits per digit. But that's not compression, as there's "hidden information" in the knowledge that you're using a restricted character set. Factor in that information and your string only contains 3 bits of information per digit. Using bytes (characters, if you assume a 1-byte encoding) to hold just the digits 0-9 is inefficient (there's 256 bytes and you're only using 10 of them), and "of course" you can hold that data more efficiently. But that's not "compression", that's simply using a better encoding. In the technical sense, "compression" is about looking at redundancies that go beyond the case of how effectively you pack data into the bytes available. > Dec to bin is not bad at removing wasted space Yep, you're not talking about what people usually refer to as compression, but rather about optimising an encoding. >, but there is a better way. Here is an example. How would you compress these >numbers. 10 digits = log2(10) bits of information. So getting down to 4 bits is about encoding. You can go further by using a variable length encoding and "extra knowledge" about which digits come up most commonly to give the common digits shorter representation. That's called Gray coding. You can use the fact that repeated copies of the same digit come up together a lot to replace them by digit + count. That's run-length encoding. There are other more complex approaches. But what *all* of these have in common is that if you provide random input (within the limits of what you support - digit strings here) then you'll *always* get at least one input that encodes to a longer output than your input. > Compress this: > > 4135124325 10 x 10 digits = 10 x log2(10) bits of information = a bit under 34 bits of information > > Bin to dec...still very large > 11110110 > 01111000 > 11111101 > 01100101 4x8 = 32 bits, but there's probably a couple of leading zeros needed if you want to encode all 10-digit numbers. > New compression method: > > 11000101 > 11000111 > 01001111 > > A full byte less than bin. You'll need to explain how to decode this, in a way that can be used to decode the encodings of arbitrary 10-digit numbers, and with any leading zeroes that are needed to cover the whole space of 10-digit numbers, before you can claim you've compressed 10-digit numbers using only 24 bits. And if you do that, you've basically overturned most of information theory, so I'd start by assuming there's a flaw in your argument - sorry about that... ;-) Hope this helps put the subject into context. Compression is a very technical subject, to "do it right". Special cases can be worked out, sure, but the "hidden assumptions" in a method are what make the difference between a "compression algorithm" and a "way of storing my particular data more efficiently". > I know many are skeptical....thats okay.this has taken 6 years, im not going > to insult your intelligence with something so juvenile as dec to bin. I'm > really trying to find an application for this since it only deals with digits > 0-9 or 0-20 or other strange combinations. Wait did you just give it away in > that small example....no, because it also encrypts in a way that has never > been done. The compression is better than log(256)÷log (10)....wait isn't > that impossible, binary is minimalistic. I agree that binary is minimalistic, > but the architecture is not, making all arguments conjecture...not laws. No-one is going to accept a claim that an algorithm you're not willing to publish is valid. This is about maths/science, not "proprietary algorithms" or anything like that. If you don't publish your methods, people will simply point at information theoretic proofs and say "either you're missing something, or your approach doesn't work in cases that I care about, so thanks but no thanks". Paul -- https://mail.python.org/mailman/listinfo/python-list