Also, if you have only 4 19 bit values, you can pack them up tighter

_2 (16&#.)\ #. |: ?. 4 19 $ 2 
109 59 161 30 136 80 64 129 56 1

Non multiples of 8 would just generally leave some trailing 0s on decode, which 
could simply be understood as invalid.


----- Original Message -----
From: 'Pascal Jasmin' via Programming <[email protected]>
To: "[email protected]" <[email protected]>
Cc: 
Sent: Wednesday, December 31, 2014 11:40 AM
Subject: Re: [Jprogramming] variable integer coding (challenge)

base64 is essentially taking 3 bytes and storing them as 4.  The inverse can 
take 4 6bit values and pack them into 3 bytes.  The J code for base64 is pretty 
generic to see how to do it for any base.  7 bit values could be packed 8 at a 
time in 7 bytes.

There is a different way to pack 8 x bit values into x bytes.  Each byte would 
store the i'th bit of the 8 values.  This may be even faster than base64 type 
coding.

  #. |: ?. 8 19 $ 2 
104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28 


----- Original Message -----
From: Joe Bogner <[email protected]>
To: [email protected]
Cc: 
Sent: Wednesday, December 31, 2014 10:02 AM
Subject: Re: [Jprogramming] variable integer coding (challenge)

Can 19 bits be stored in memory or a file on modern computers without
taking up 3 bytes (24 bits)? The only idea that came to mind would some bit
packing scheme with multiple items. That seems complicated unless there was
a need to store billions of these numbers

On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote:

> Thanks. I missed the point of the variable breakup point but exploring
> different options below illustrated it for me.
>
> Changing the breakup point can encode the numbers into different bit
> lengths (other than 8) which can then be written to the file/memory.
>
> For example, 1021 encoded in 511 breakups would take 18 bits
>
>    $ #: 511 toVint 1021
> 2 9
>
> Whereas encoding it as 255 would require 40 bits, which is more than a
> 32-bit int. This is where I was failing to see the point of encoding
> larger numbers
>
> NB. 5 bytes, 40 bits
>     $ #: 255 toVint 1021
> 5 8
>
> Without trying the binary representation, I was expecting the breakup
> point to take the same space. For example, below I would have thought
> that I'd need to write a
>
> NB. 6 bytes
>    255 toVint 511 toVint 1021
> 255 255 1 255 255 0
>
>
> If I understand this correctly, I can see how variable ints could
> shave off a few bits if the range of numbers is known in advance,
> which is often the case.
>
>
> Regarding packed strings I've been thinking about it recently as well.
> I would like to see how it could be used to pack a table of strings
> (rows and columns).
>
> The following string could be a packed representation of a 2x2 table
>
> FrankUSATomGermany
>
> The binary format could be:
>
> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length]....
>
> The header for the string could be
> 2 2 5 3 3 7
>
> Or maybe if you wanted to support a variable number of columns (unsure
> of the usecase of this)
>
> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 length]
> 2 2 5 3 2 3 7
>
> I don't know well it'd perform relative to boxed tables or inverted
> tables. I would imagine the memory would be reduced for storage but
> I'd have to test to see how much memory would be used on different
> operations. I'd be interested in the performance of operations like
> searching (e.g. find all the rows with a country of Germany)
>
> It might also be a safer way of storing user-input text without
> worrying about escaping delimiters.
>
> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
> > I also provided some adverbs
> >
> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]'
> > fromVint=: 1 : 'm&~: +/;.2 ]'
> >
> > these let you set the breakup point to what you wish
> >
> >   15 toVint 33 12 17
> > 15 15 3 12 15 2
> >   511 toVint 1233
> > 511 511 211
> >
> >   511 fromVint 511 toVint 1233 444
> > 1233 444
> > It can provide simple compression.  If you have a list of many strings
> where almost all of them will fit into a 6bit alphabet, you can still
> encode a full 7 bit or 8bit alphabet.  On average, you'd save space using
> 6bit alphabet, depending on your context.
> >
> > The variable string functions I showed, still let you do a substring
> search on the packed data.  Part of it is disk and memory compression
> (don't expand until you need to, and a string takes much less memory than a
> list of numbers < 256), but part of it is not being forced in design to
> determine what the impossible cutoff number is.
> >
> > A big advantage related to J is having binary data records.  If you
> encode the length of the binary data, you no longer have to box them if a
> trailing 0 or space would be difficult to know whether it is a fill or
> intentional.
> >
> > Beyond binary data, there is dumb stuff such as storing addresses.  Most
> people have just 1,  Most fit within 255 bytes.  Usually no one cares about
> address info except when they do or already have the client record.  Do you
> really want a separate table with 1 to many join with a possible 10000
> character fixed size per address just because you're unsure if it will ever
> be possible for you to store an address to an African village as a set of
> driving instructions and then a description of the house relative to 3
> other houses and landmarks just in case one of the other houses gets
> painted.
> >
> > Another neat feature about the coding method is that if you have a lot
> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at least
> compresses well.
> >
> > A minor thing is that the encoding still lets you do search by < or >.
> If you are storing the number of orders per customer.  The first byte tells
> you whether it is > 254 or less than any other number below 255 (probably
> enough info to select customers of interest).  Another minor thing is that
> you don't have to use the encoding codes until there is an overflow.  It
> may be easier to just switch functions to decode variable data, then to
> reshape terrabytes of data.
> > ________________________________
> > From: Joe Bogner <[email protected]>
> > To: [email protected]
> > Sent: Tuesday, December 30, 2014 10:45 AM
> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >
> >
> > Pascal,
> >
> > Thanks for sharing. Can you also share a potential use case?
> >
> > I had to remind myself about the size of integers (4 bytes) on 32-bit.
> >
> > Will this only save space on numbers less than 1019?
> >
> > NB. 4 bytes
> > toVint2 1019
> > 255 255 255 254
> >
> > NB. 5 bytes
> > toVint2 1020
> > 255 255 255 255 0
> >
> > Are you looking for the savings in terms of disk space (saving it to a
> > file) or working memory?
> >
> >
> >
> >
> >
> >
> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming
> > <[email protected]> wrote:
> >> A common design decision is how large to make a field.  For many
> numbers, we think that a byte will be large enough to hold practical
> values, but one day we will find out that it is not.  This is essentially
> the Y2k problem, or year 2032 issue.
> >>
> >> A simple variable integer coding (tied to byte ranges)
> >>
> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each
> >>
> >>   toVint 566 44
> >> 255 255 56 44
> >>
> >> It now takes 3 bytes to store 566, and one byte to store 44.
> >>
> >> the challenge is to write fromVint such that
> >>
> >>
> >> 566 44 -: fromVint 255 255 56 44
> >>
> >> as an extra challenge, is there a way to write toVint or fromVint such
> that it is faster than using intermediate boxing?
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm

> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to