Re: repack ubyte[] to use only 7 bits

2014-12-13 Thread Manolo via Digitalmars-d-learn
On Saturday, 13 December 2014 at 10:09:27 UTC, Charles Hixson via 
Digitalmars-d-learn wrote:
Is there a standard way to do this?  The code below is 
untested, as I haven't yet written the x7to8 routine, and came 
up with a better way to do what this was to accomplish, but it 
feels as if this should be somewhere in the standard library, 
if I could only find it.


/** Repack the data from an array of ubytes into an array of 
ubytes of
 * which only the last 7 are significant.  The high bit will be 
set only

 * if the byte would otherwise be zero.*/
byte[]x8to7 (ubyte[] bin)
{
ubyte[] bout;
//bit masks:0 = 0xfe = 1110, 0x00 = 
//1 = 0x7f = 0111, 0x00 = 
//2 = 0x3f = 0011, 0x80 = 1000
//3 = 0x1f = 0001, 0xc0 = 1100
//4 = 0x0f = , 0xe0 = 1110
//5 = 0x07 = 0111, 0xf0 = 
//6 = 0x03 = 0011, 0xf8 = 1000
//7 = 0x01 = 0001, 0xfc = 1100
if (bin.length  1)returnbout;
intfByte, fBit;
while(fByte  bin.length)
{if (fByte + 1 == bin.length  fBit  1)  break;
ubyteb;
switch (fBit)
{case0:
b=bin[fByte]/ 2;
break;
case1:
b=bin[fByte]  0x7f;
break;
case2:
ubyteb1=(bin[fByte]  0x3f)  1;
ubyteb2=(bin[fByte + 1]  0x80)  
7;

b~=(b1 | b2);
break;
case3:
ubyteb1=(bin[fByte]  0x1f)  2;
ubyteb2=(bin[fByte + 1]  0xc0)  
6;

b~= (b1 | b2);
break;
case4:
ubyteb1=(bin[fByte]  0x0f)  3;
ubyteb2=(bin[fByte + 1]  0xe0)  
5;

b~= (b1 | b2);
break;
case5:
ubyteb1=(bin[fByte]  0x07)  4;
ubyteb2=(bin[fByte + 1]  0xf0)  
4;

b~= (b1 | b2);
break;
case6:
ubyteb1=(bin[fByte]  0x03)  5;
ubyteb2=(bin[fByte + 1]  0xf8)  
3;

b~= (b1 | b2);
break;
case7:
ubyteb1=(bin[fByte]  0x01)  6;
ubyteb2=(bin[fByte + 1]  0xfc)  
2;

b~= (b1 | b2);
break;
default:
assert (false, This path should never be 
taken);

}//switch (fBit)
if(b == 0)bout~=0x80;
elsebout~=b;
fBit=fBit + 7;
if(fBit  7)
{fByte++;
fBit -=7;
}
}
}


Are you trying to make a kind-of Variable-Length quantity 
encoder ?


eg:
0b10101110: last bit not set, integrate 0b10101110 and stop 
reading.
0b10011001: last bit set, integrate 0b10011000 and continue to 
next byte.


http://en.wikipedia.org/wiki/Variable-length_quantity

except that this algo limits the length to 24 bits. It was used a 
lot with
MIDI, at a time when hardware memory was costly (eg inside 
hardware synthesizer or workstations).


Re: repack ubyte[] to use only 7 bits

2014-12-13 Thread Manolo via Digitalmars-d-learn

On Saturday, 13 December 2014 at 11:20:21 UTC, Manolo wrote:
On Saturday, 13 December 2014 at 10:09:27 UTC, Charles Hixson 
via Digitalmars-d-learn wrote:
Is there a standard way to do this?  The code below is 
untested, as I haven't yet written the x7to8 routine, and came 
up with a better way to do what this was to accomplish, but it 
feels as if this should be somewhere in the standard library, 
if I could only find it.


/** Repack the data from an array of ubytes into an array of 
ubytes of
* which only the last 7 are significant.  The high bit will be 
set only

* if the byte would otherwise be zero.*/
byte[]x8to7 (ubyte[] bin)
{
   ubyte[] bout;
   //bit masks:0 = 0xfe = 1110, 0x00 = 
   //1 = 0x7f = 0111, 0x00 = 
   //2 = 0x3f = 0011, 0x80 = 1000
   //3 = 0x1f = 0001, 0xc0 = 1100
   //4 = 0x0f = , 0xe0 = 1110
   //5 = 0x07 = 0111, 0xf0 = 
   //6 = 0x03 = 0011, 0xf8 = 1000
   //7 = 0x01 = 0001, 0xfc = 1100
   if (bin.length  1)returnbout;
   intfByte, fBit;
   while(fByte  bin.length)
   {if (fByte + 1 == bin.length  fBit  1)  break;
   ubyteb;
   switch (fBit)
   {case0:
   b=bin[fByte]/ 2;
   break;
   case1:
   b=bin[fByte]  0x7f;
   break;
   case2:
   ubyteb1=(bin[fByte]  0x3f)  1;
   ubyteb2=(bin[fByte + 1]  0x80)  
7;

   b~=(b1 | b2);
   break;
   case3:
   ubyteb1=(bin[fByte]  0x1f)  2;
   ubyteb2=(bin[fByte + 1]  0xc0)  
6;

   b~= (b1 | b2);
   break;
   case4:
   ubyteb1=(bin[fByte]  0x0f)  3;
   ubyteb2=(bin[fByte + 1]  0xe0)  
5;

   b~= (b1 | b2);
   break;
   case5:
   ubyteb1=(bin[fByte]  0x07)  4;
   ubyteb2=(bin[fByte + 1]  0xf0)  
4;

   b~= (b1 | b2);
   break;
   case6:
   ubyteb1=(bin[fByte]  0x03)  5;
   ubyteb2=(bin[fByte + 1]  0xf8)  
3;

   b~= (b1 | b2);
   break;
   case7:
   ubyteb1=(bin[fByte]  0x01)  6;
   ubyteb2=(bin[fByte + 1]  0xfc)  
2;

   b~= (b1 | b2);
   break;
   default:
   assert (false, This path should never be 
taken);

   }//switch (fBit)
   if(b == 0)bout~=0x80;
   elsebout~=b;
   fBit=fBit + 7;
   if(fBit  7)
   {fByte++;
   fBit -=7;
   }
   }
}


Are you trying to make a kind-of Variable-Length quantity 
encoder ?


eg:
0b10101110: last bit not set, integrate 0b10101110 and stop 
reading.
0b10011001: last bit set, integrate 0b10011000 and continue to 
next byte.


http://en.wikipedia.org/wiki/Variable-length_quantity

except that this algo limits the length to 24 bits. It was used 
a lot with
MIDI, at a time when hardware memory was costly (eg inside 
hardware synthesizer or workstations).


Sorry, lack or accuraccy: the maximum value represented was a 24 
bit unsigned integer, but the data length was 32 bit for this 
value.
The thing is that the format included various fields, but because 
of the memory price the algo saved space when values where less 
than 0X7F, because only one byte was needed. Nowadays such as 
format would allow for example always 4 bytes to describes the 
data length:


nowadays:
data len L | data
0 1 2 3  | 4 ... L-1

so nowadays, we can afford 4 bytes to say that a field length 
is 1


olddays:
variable len VL | data
1 to ?  | VL ... VL-1

olddays, they used only one byte to say that a field length is 
1.


Re: repack ubyte[] to use only 7 bits

2014-12-13 Thread Manolo via Digitalmars-d-learn
On Saturday, 13 December 2014 at 19:52:33 UTC, Charles Hixson via 
Digitalmars-d-learn wrote:


On 12/13/2014 03:20 AM, Manolo via Digitalmars-d-learn wrote:
On Saturday, 13 December 2014 at 10:09:27 UTC, Charles Hixson 
via Digitalmars-d-learn wrote:
Is there a standard way to do this?  The code below is 
untested, as I haven't yet written the x7to8 routine, and 
came up with a better way to do what this was to accomplish, 
but it feels as if this should be somewhere in the standard 
library, if I could only find it.


/** Repack the data from an array of ubytes into an array of 
ubytes of
* which only the last 7 are significant.  The high bit will 
be set only

* if the byte would otherwise be zero.*/
byte[]x8to7 (ubyte[] bin)
{
   ubyte[] bout;
   //bit masks:0 = 0xfe = 1110, 0x00 = 
   //1 = 0x7f = 0111, 0x00 = 
   //2 = 0x3f = 0011, 0x80 = 1000
   //3 = 0x1f = 0001, 0xc0 = 1100
   //4 = 0x0f = , 0xe0 = 1110
   //5 = 0x07 = 0111, 0xf0 = 
   //6 = 0x03 = 0011, 0xf8 = 1000
   //7 = 0x01 = 0001, 0xfc = 1100
   if (bin.length  1)returnbout;
   intfByte, fBit;
   while(fByte  bin.length)
   {if (fByte + 1 == bin.length  fBit  1) break;
   ubyteb;
   switch (fBit)
   {case0:
   b=bin[fByte]/ 2;
   break;
   case1:
   b=bin[fByte]  0x7f;
   break;
   case2:
   ubyteb1=(bin[fByte]  0x3f)  1;
   ubyteb2=(bin[fByte + 1]  0x80)
 7;
   b~=(b1 | b2);
   break;
   case3:
   ubyteb1=(bin[fByte]  0x1f)  2;
   ubyteb2=(bin[fByte + 1]  0xc0)
 6;
   b~= (b1 | b2);
   break;
   case4:
   ubyteb1=(bin[fByte]  0x0f)  3;
   ubyteb2=(bin[fByte + 1]  0xe0)
 5;
   b~= (b1 | b2);
   break;
   case5:
   ubyteb1=(bin[fByte]  0x07)  4;
   ubyteb2=(bin[fByte + 1]  0xf0)
 4;
   b~= (b1 | b2);
   break;
   case6:
   ubyteb1=(bin[fByte]  0x03)  5;
   ubyteb2=(bin[fByte + 1]  0xf8)
 3;
   b~= (b1 | b2);
   break;
   case7:
   ubyteb1=(bin[fByte]  0x01)  6;
   ubyteb2=(bin[fByte + 1]  0xfc)
 2;
   b~= (b1 | b2);
   break;
   default:
   assert (false, This path should never be 
taken);

   }//switch (fBit)
   if(b == 0)bout~=0x80;
   elsebout~=b;
   fBit=fBit + 7;
   if(fBit  7)
   {fByte++;
   fBit -=7;
   }
   }
}


Are you trying to make a kind-of Variable-Length quantity 
encoder ?


eg:
0b10101110: last bit not set, integrate 0b10101110 and stop 
reading.
0b10011001: last bit set, integrate 0b10011000 and continue to 
next byte.


http://en.wikipedia.org/wiki/Variable-length_quantity

except that this algo limits the length to 24 bits. It was 
used a lot with
MIDI, at a time when hardware memory was costly (eg inside 
hardware synthesizer or workstations).


What I was trying to do was pack things into 7 bits so I could 
recode 0's as 128.  I finally thought clearly about it and 
realized that I only needed to use one particular byte value (I 
chose 127) to duplicate so I could repack things with a string 
of 0's replaced by 127 followed by the length (up to 126) of 
zeros, and for 127 itself I'd just emit 127 twice.  This was to 
pack binary data into a string that C routines wouldn't think 
had ended partway through.  (If I get more than 127 zeros in a 
row, I just have more than one packing code.)


Sorry, I misunderstood the thing.


Re: repack ubyte[] to use only 7 bits

2014-12-06 Thread bearophile via Digitalmars-d-learn

Charles Hixson:


byte[]x8to7 (ubyte[] bin)


Better to add some annotations, like pure, @safe, nothrow, if you 
can, and to annotate the bin with an in.




intfByte, fBit;


It's probably better to define them as size_t.




switch (fBit)


I think D doesn't yet allow this switch to be _meaningfully_ a 
final switch.




b=bin[fByte]  0x7f;


D allows binary number literals as 0b100110010101.



b~=(b1 | b2);


Perhaps an output range is better?



if(b == 0)bout~=0x80;
elsebout~=b;
fBit=fBit + 7;
if(fBit  7)
{fByte++;
fBit -=7;


The formatting seems a bit messy.

Bye,
bearophile


Re: repack ubyte[] to use only 7 bits

2014-12-06 Thread Charles Hixson via Digitalmars-d-learn
Your comments would be reasonable if this were destined for a library, 
but I haven't even finished checking it (and probably won't since I've 
switched to a simple zero elimination scheme).  But this is a bit 
specialized for a library...a library should probably deal with 
arbitrary ints from 8 to 64 or 128 [but I don't think that the 128 bit 
type is yet standard, only reserved].  I just thought that something 
like that should be available, possibly along the lines of Python's pack 
and unpack, and wondered where it was and what it was called.)


Additionally, I'm clearly not the best person to write the library 
version, as I still have LOTS of trouble with D templates.  And I have 
not successfully wrapped my mind around D ranges...which is odd, because 
neither Ruby nor Python ranges give me much trouble. Perhaps its the syntax.


As for  pure, @safe, and nothrow ... I'd like to understand that I 
COULD use those annotations.  (The in I agree should be applied.  I 
understand that one.)


As for size_t for indexes...here I think we disagree.  It would be a bad 
mistake to use an index that size.  I even considered using short or 
ushort, but I ran into a comment awhile back saying that one should 
never use those for local variables.  This *can't* be an efficient 
enough way that it would be appropriate to use it for a huge array...but 
that should probably be documented if it were for a library.  (If I were 
to use size_t indexing, I'd want to modify things so that I could feed 
it a file as input, and that's taking it well away from what I was 
building it for:  converting input to a redis database so that I could 
feed it raw serial data streams without first converting it into human 
readable formats.  I wanted to make everything ASCII-7 binary data, 
which, when I thought about it more, was overkill.  All I need to do is 
eliminate internal zeros, since C handles various extended character 
formats by ignoring them.


I'm not clear what you mean by a final switch.  fBit must adopt 
various different values during execution.  If you mean it's the same as 
a nest of if...else if ... statements, that's all I was really 
expecting, but I thought switch was a bit more readable.


Binary literals would be more self-documenting, but would make the code 
harder to read.  If I'd though of them I might have used them...but 
maybe not.


Output range?  Here I'm not sure what you're suggesting, probably 
because I don't understand D ranges.


The formatting got a bit messed up during pasting from the editor to the 
mail message.  I should have looked at it more carefully.  My standard 
says that unless the entire block is on a single line, the closing brace 
should align with the opening brace.  And I use tabs for spacing which 
works quite well in the editor, but I *do* need to remember to convert 
it to spaces before doing a cut and paste.


Thanks for your comments.  I guess that means that there *isn't* a 
standard function that does this.

Charles

On 12/06/2014 03:01 PM, bearophile via Digitalmars-d-learn wrote:

Charles Hixson:


byte[]x8to7 (ubyte[] bin)


Better to add some annotations, like pure, @safe, nothrow, if you can, 
and to annotate the bin with an in.




intfByte, fBit;


It's probably better to define them as size_t.




switch (fBit)


I think D doesn't yet allow this switch to be _meaningfully_ a final 
switch.




b=bin[fByte]  0x7f;


D allows binary number literals as 0b100110010101.



b~=(b1 | b2);


Perhaps an output range is better?



if(b == 0)bout~= 0x80;
elsebout~=b;
fBit=fBit + 7;
if(fBit  7)
{fByte++;
fBit -=7;


The formatting seems a bit messy.

Bye,
bearophile