Re: [Jprogramming] variable integer coding (challenge)

Raul Miller Tue, 06 Jan 2015 13:54:45 -0800

Yes, I agree, you need to deal with the "sign bit" somehow.

There's other ways of doing that, though, which might be faster.


For example:
   b32int=: [: ((<.2^32)&|)&.(+&(<.2^31))2#.]
   b32int (32#2)#: _17998321
_17998321

I think that'll use a floating point representation on a 32 bit
machine, but that's easily fixed with another <.

Thanks,

-- 
Raul

On Tue, Jan 6, 2015 at 2:48 PM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> thank you Raul,  I think this "decryption" function is still needed though?
>
>   -&4294967296^:(>&2147483648) #. (32#2)#: _17998321
> _17998321
>   -&4294967296^:(>&2147483648) #. (32#2)#: 17998321
> 17998321
>
>
>
>
>
> ----- Original Message -----
> From: Raul Miller <[email protected]>
> To: Programming forum <[email protected]>
> Cc:
> Sent: Tuesday, January 6, 2015 1:43 PM
> Subject: Re: [Jprogramming] variable integer coding (challenge)
>
> The &.|. in ($!.1)&.|. is so the 1s you are padding with appear on the
> left, rather than on the right.
>
> Another way of accomplishing that would be like this:
>    _32 {.!.1 #:-17998321
> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>
> But note that there's a problem here (with both of these approaches):
>    32 ($!.1)&.|. 1
> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ...padding on the left with 1s is only valid for negative numbers.
>
> Meanwhile,
>    #: 256 #. |. a.i. 2 ic _17998321
> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>
> ... so the |. which appears in that expression works the same way on
> probably any intel machine and would probably be relevant in most
> contexts (though I've not tested on ARM processor, and I'm not even
> sure if current versions of J run on ARM... and since ARM is
> "bi-endian" I'm not sure what "native" would mean in that context -
> but I guess it would make sense for ARM handling to mimic intel
> handling).
>
> But there's an even simpler approach:
>    (32#2)#:_17998321
> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>
> Simplicity is a virtue?
>
> Thanks,
>
> --
> Raul
>
> On Tue, Jan 6, 2015 at 12:44 PM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
>> I'm not sure this is the same problem, unless you are trying to encode 
>> within a range of lattitudes that include dense civilization, and longitudes 
>> that exclude oceans.
>>
>> but the 32bit binary rep of -17998321
>>
>> 32 ($!.1)&.|. #: -17998321
>> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>>
>> though the "memcopy" version is simpler.
>>
>>   #: 256 #. |. a.i. 2 ic _17998321
>> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>>
>>   -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321
>> _17998321
>>   -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321
>> 17998321
>>
>>
>>  I'm not sure if the |. step is only for non macs.
>>
>>
>>
>> ----- Original Message -----
>> From: Vijay Lulla <[email protected]>
>> To: [email protected]
>> Cc:
>> Sent: Tuesday, January 6, 2015 11:38 AM
>> Subject: Re: [Jprogramming] variable integer coding (challenge)
>>
>> I have followed this thread with interest.  Much of this discussion is
>> relevant to
>> https://developers.google.com/maps/documentation/utilities/polylinealgorithm
>> . Like Joe, I also enjoyed playing at bits level and was impressed by how
>> easy J makes such explorations.  Unfortunately, I couldn't generate all the
>> encoded values in the example table listed on the page.  Maybe others in
>> this group might enjoy exploring this algorithm.
>>
>> Thanks for the great discussion.
>>
>> On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote:
>>
>>> Thanks. I couldn't follow the examples so I worked through it on my
>>> own. Sharing it here for anyone else who's following. I switched to 9
>>> bits instead of 19 to simplify the example
>>>
>>>    NB. 300 needs 9 bits to be expressed
>>>    $ #: 300
>>> 9
>>>
>>>    NB. Show 4 9 bit numbers
>>>    $ #: 300 301 302 303
>>> 4 9
>>>
>>> Writing those 4 numbers would take 16 bytes[1] since J ints are 4
>>> bytes as well as C
>>>
>>> We can compress it down to 5 bytes if we pack all the bits together:
>>>
>>>    ] packed=. #. _8[\ , #: 300 301 302 303
>>> 150 75 101 210 240
>>>
>>>    $ packed
>>> 5
>>>
>>> Those 5 bytes could be written to a file and re-read
>>>
>>> If I knew in advance that those numbers were 9 bits, it can be reversed
>>>
>>>    #.  _9[\, #: packed
>>> 300 301 302 303 0
>>>
>>> We could also pack it into 8 bytes (two 32 bit integers), but I don't
>>> see an advantage of that over packing into a single byte (8 bit)
>>>
>>>    ] packed=. #. _32[\ , #: 300 301 302 303
>>> 2.52152e9 4.02653e9
>>>
>>>    #.  _9[\, #: packed
>>> 300 301 302 303 0 0 0 0
>>>
>>>
>>> This may be elementary for some but I haven't worked at this level in
>>> years so I enjoyed playing with it
>>>
>>>
>>>
>>>
>>> [1] - http://www.jsoftware.com/help/learning/27.htm
>>>
>>> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming
>>> <[email protected]> wrote:
>>> > Also, if you have only 4 19 bit values, you can pack them up tighter
>>> >
>>> > _2 (16&#.)\ #. |: ?. 4 19 $ 2
>>> > 109 59 161 30 136 80 64 129 56 1
>>> >
>>> > Non multiples of 8 would just generally leave some trailing 0s on
>>> decode, which could simply be understood as invalid.
>>> >
>>> >
>>> > ----- Original Message -----
>>> > From: 'Pascal Jasmin' via Programming <[email protected]>
>>> > To: "[email protected]" <[email protected]>
>>> > Cc:
>>> > Sent: Wednesday, December 31, 2014 11:40 AM
>>> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>>> >
>>> > base64 is essentially taking 3 bytes and storing them as 4.  The inverse
>>> can take 4 6bit values and pack them into 3 bytes.  The J code for base64
>>> is pretty generic to see how to do it for any base.  7 bit values could be
>>> packed 8 at a time in 7 bytes.
>>> >
>>> > There is a different way to pack 8 x bit values into x bytes.  Each byte
>>> would store the i'th bit of the 8 values.  This may be even faster than
>>> base64 type coding.
>>> >
>>> >   #. |: ?. 8 19 $ 2
>>> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28
>>> >
>>> >
>>> > ----- Original Message -----
>>> > From: Joe Bogner <[email protected]>
>>> > To: [email protected]
>>> > Cc:
>>> > Sent: Wednesday, December 31, 2014 10:02 AM
>>> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>>> >
>>> > Can 19 bits be stored in memory or a file on modern computers without
>>> > taking up 3 bytes (24 bits)? The only idea that came to mind would some
>>> bit
>>> > packing scheme with multiple items. That seems complicated unless there
>>> was
>>> > a need to store billions of these numbers
>>> >
>>> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote:
>>> >
>>> >> Thanks. I missed the point of the variable breakup point but exploring
>>> >> different options below illustrated it for me.
>>> >>
>>> >> Changing the breakup point can encode the numbers into different bit
>>> >> lengths (other than 8) which can then be written to the file/memory.
>>> >>
>>> >> For example, 1021 encoded in 511 breakups would take 18 bits
>>> >>
>>> >>    $ #: 511 toVint 1021
>>> >> 2 9
>>> >>
>>> >> Whereas encoding it as 255 would require 40 bits, which is more than a
>>> >> 32-bit int. This is where I was failing to see the point of encoding
>>> >> larger numbers
>>> >>
>>> >> NB. 5 bytes, 40 bits
>>> >>     $ #: 255 toVint 1021
>>> >> 5 8
>>> >>
>>> >> Without trying the binary representation, I was expecting the breakup
>>> >> point to take the same space. For example, below I would have thought
>>> >> that I'd need to write a
>>> >>
>>> >> NB. 6 bytes
>>> >>    255 toVint 511 toVint 1021
>>> >> 255 255 1 255 255 0
>>> >>
>>> >>
>>> >> If I understand this correctly, I can see how variable ints could
>>> >> shave off a few bits if the range of numbers is known in advance,
>>> >> which is often the case.
>>> >>
>>> >>
>>> >> Regarding packed strings I've been thinking about it recently as well.
>>> >> I would like to see how it could be used to pack a table of strings
>>> >> (rows and columns).
>>> >>
>>> >> The following string could be a packed representation of a 2x2 table
>>> >>
>>> >> FrankUSATomGermany
>>> >>
>>> >> The binary format could be:
>>> >>
>>> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length]....
>>> >>
>>> >> The header for the string could be
>>> >> 2 2 5 3 3 7
>>> >>
>>> >> Or maybe if you wanted to support a variable number of columns (unsure
>>> >> of the usecase of this)
>>> >>
>>> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 length]
>>> >> 2 2 5 3 2 3 7
>>> >>
>>> >> I don't know well it'd perform relative to boxed tables or inverted
>>> >> tables. I would imagine the memory would be reduced for storage but
>>> >> I'd have to test to see how much memory would be used on different
>>> >> operations. I'd be interested in the performance of operations like
>>> >> searching (e.g. find all the rows with a country of Germany)
>>> >>
>>> >> It might also be a safer way of storing user-input text without
>>> >> worrying about escaping delimiters.
>>> >>
>>> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming
>>> >> <[email protected]> wrote:
>>> >> > I also provided some adverbs
>>> >> >
>>> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]'
>>> >> > fromVint=: 1 : 'm&~: +/;.2 ]'
>>> >> >
>>> >> > these let you set the breakup point to what you wish
>>> >> >
>>> >> >   15 toVint 33 12 17
>>> >> > 15 15 3 12 15 2
>>> >> >   511 toVint 1233
>>> >> > 511 511 211
>>> >> >
>>> >> >   511 fromVint 511 toVint 1233 444
>>> >> > 1233 444
>>> >> > It can provide simple compression.  If you have a list of many strings
>>> >> where almost all of them will fit into a 6bit alphabet, you can still
>>> >> encode a full 7 bit or 8bit alphabet.  On average, you'd save space
>>> using
>>> >> 6bit alphabet, depending on your context.
>>> >> >
>>> >> > The variable string functions I showed, still let you do a substring
>>> >> search on the packed data.  Part of it is disk and memory compression
>>> >> (don't expand until you need to, and a string takes much less memory
>>> than a
>>> >> list of numbers < 256), but part of it is not being forced in design to
>>> >> determine what the impossible cutoff number is.
>>> >> >
>>> >> > A big advantage related to J is having binary data records.  If you
>>> >> encode the length of the binary data, you no longer have to box them if
>>> a
>>> >> trailing 0 or space would be difficult to know whether it is a fill or
>>> >> intentional.
>>> >> >
>>> >> > Beyond binary data, there is dumb stuff such as storing addresses.
>>> Most
>>> >> people have just 1,  Most fit within 255 bytes.  Usually no one cares
>>> about
>>> >> address info except when they do or already have the client record.  Do
>>> you
>>> >> really want a separate table with 1 to many join with a possible 10000
>>> >> character fixed size per address just because you're unsure if it will
>>> ever
>>> >> be possible for you to store an address to an African village as a set
>>> of
>>> >> driving instructions and then a description of the house relative to 3
>>> >> other houses and landmarks just in case one of the other houses gets
>>> >> painted.
>>> >> >
>>> >> > Another neat feature about the coding method is that if you have a lot
>>> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at least
>>> >> compresses well.
>>> >> >
>>> >> > A minor thing is that the encoding still lets you do search by < or >.
>>> >> If you are storing the number of orders per customer.  The first byte
>>> tells
>>> >> you whether it is > 254 or less than any other number below 255
>>> (probably
>>> >> enough info to select customers of interest).  Another minor thing is
>>> that
>>> >> you don't have to use the encoding codes until there is an overflow.  It
>>> >> may be easier to just switch functions to decode variable data, then to
>>> >> reshape terrabytes of data.
>>> >> > ________________________________
>>> >> > From: Joe Bogner <[email protected]>
>>> >> > To: [email protected]
>>> >> > Sent: Tuesday, December 30, 2014 10:45 AM
>>> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>>> >> >
>>> >> >
>>> >> > Pascal,
>>> >> >
>>> >> > Thanks for sharing. Can you also share a potential use case?
>>> >> >
>>> >> > I had to remind myself about the size of integers (4 bytes) on 32-bit.
>>> >> >
>>> >> > Will this only save space on numbers less than 1019?
>>> >> >
>>> >> > NB. 4 bytes
>>> >> > toVint2 1019
>>> >> > 255 255 255 254
>>> >> >
>>> >> > NB. 5 bytes
>>> >> > toVint2 1020
>>> >> > 255 255 255 255 0
>>> >> >
>>> >> > Are you looking for the savings in terms of disk space (saving it to a
>>> >> > file) or working memory?
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming
>>> >> > <[email protected]> wrote:
>>> >> >> A common design decision is how large to make a field.  For many
>>> >> numbers, we think that a byte will be large enough to hold practical
>>> >> values, but one day we will find out that it is not.  This is
>>> essentially
>>> >> the Y2k problem, or year 2032 issue.
>>> >> >>
>>> >> >> A simple variable integer coding (tied to byte ranges)
>>> >> >>
>>> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each
>>> >> >>
>>> >> >>   toVint 566 44
>>> >> >> 255 255 56 44
>>> >> >>
>>> >> >> It now takes 3 bytes to store 566, and one byte to store 44.
>>> >> >>
>>> >> >> the challenge is to write fromVint such that
>>> >> >>
>>> >> >>
>>> >> >> 566 44 -: fromVint 255 255 56 44
>>> >> >>
>>> >> >> as an extra challenge, is there a way to write toVint or fromVint
>>> such
>>> >> that it is faster than using intermediate boxing?
>>> >> >>
>>> ----------------------------------------------------------------------
>>> >> >> For information about J forums see
>>> http://www.jsoftware.com/forums.htm
>
>>
>>> >
>>> >> > ----------------------------------------------------------------------
>>> >> > For information about J forums see
>>> http://www.jsoftware.com/forums.htm
>>> >> > ----------------------------------------------------------------------
>>> >> > For information about J forums see
>>> http://www.jsoftware.com/forums.htm
>>> >>
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] variable integer coding (challenge)

Reply via email to