I'm not sure this is the same problem, unless you are trying to encode within a 
range of lattitudes that include dense civilization, and longitudes that 
exclude oceans.

but the 32bit binary rep of -17998321

32 ($!.1)&.|. #: -17998321 
1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1

though the "memcopy" version is simpler.

  #: 256 #. |. a.i. 2 ic _17998321 
1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1

  -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321 
_17998321 
  -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321 
17998321 


 I'm not sure if the |. step is only for non macs.



----- Original Message -----
From: Vijay Lulla <[email protected]>
To: [email protected]
Cc: 
Sent: Tuesday, January 6, 2015 11:38 AM
Subject: Re: [Jprogramming] variable integer coding (challenge)

I have followed this thread with interest.  Much of this discussion is
relevant to
https://developers.google.com/maps/documentation/utilities/polylinealgorithm
. Like Joe, I also enjoyed playing at bits level and was impressed by how
easy J makes such explorations.  Unfortunately, I couldn't generate all the
encoded values in the example table listed on the page.  Maybe others in
this group might enjoy exploring this algorithm.

Thanks for the great discussion.

On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote:

> Thanks. I couldn't follow the examples so I worked through it on my
> own. Sharing it here for anyone else who's following. I switched to 9
> bits instead of 19 to simplify the example
>
>    NB. 300 needs 9 bits to be expressed
>    $ #: 300
> 9
>
>    NB. Show 4 9 bit numbers
>    $ #: 300 301 302 303
> 4 9
>
> Writing those 4 numbers would take 16 bytes[1] since J ints are 4
> bytes as well as C
>
> We can compress it down to 5 bytes if we pack all the bits together:
>
>    ] packed=. #. _8[\ , #: 300 301 302 303
> 150 75 101 210 240
>
>    $ packed
> 5
>
> Those 5 bytes could be written to a file and re-read
>
> If I knew in advance that those numbers were 9 bits, it can be reversed
>
>    #.  _9[\, #: packed
> 300 301 302 303 0
>
> We could also pack it into 8 bytes (two 32 bit integers), but I don't
> see an advantage of that over packing into a single byte (8 bit)
>
>    ] packed=. #. _32[\ , #: 300 301 302 303
> 2.52152e9 4.02653e9
>
>    #.  _9[\, #: packed
> 300 301 302 303 0 0 0 0
>
>
> This may be elementary for some but I haven't worked at this level in
> years so I enjoyed playing with it
>
>
>
>
> [1] - http://www.jsoftware.com/help/learning/27.htm
>
> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
> > Also, if you have only 4 19 bit values, you can pack them up tighter
> >
> > _2 (16&#.)\ #. |: ?. 4 19 $ 2
> > 109 59 161 30 136 80 64 129 56 1
> >
> > Non multiples of 8 would just generally leave some trailing 0s on
> decode, which could simply be understood as invalid.
> >
> >
> > ----- Original Message -----
> > From: 'Pascal Jasmin' via Programming <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Cc:
> > Sent: Wednesday, December 31, 2014 11:40 AM
> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >
> > base64 is essentially taking 3 bytes and storing them as 4.  The inverse
> can take 4 6bit values and pack them into 3 bytes.  The J code for base64
> is pretty generic to see how to do it for any base.  7 bit values could be
> packed 8 at a time in 7 bytes.
> >
> > There is a different way to pack 8 x bit values into x bytes.  Each byte
> would store the i'th bit of the 8 values.  This may be even faster than
> base64 type coding.
> >
> >   #. |: ?. 8 19 $ 2
> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28
> >
> >
> > ----- Original Message -----
> > From: Joe Bogner <[email protected]>
> > To: [email protected]
> > Cc:
> > Sent: Wednesday, December 31, 2014 10:02 AM
> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >
> > Can 19 bits be stored in memory or a file on modern computers without
> > taking up 3 bytes (24 bits)? The only idea that came to mind would some
> bit
> > packing scheme with multiple items. That seems complicated unless there
> was
> > a need to store billions of these numbers
> >
> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote:
> >
> >> Thanks. I missed the point of the variable breakup point but exploring
> >> different options below illustrated it for me.
> >>
> >> Changing the breakup point can encode the numbers into different bit
> >> lengths (other than 8) which can then be written to the file/memory.
> >>
> >> For example, 1021 encoded in 511 breakups would take 18 bits
> >>
> >>    $ #: 511 toVint 1021
> >> 2 9
> >>
> >> Whereas encoding it as 255 would require 40 bits, which is more than a
> >> 32-bit int. This is where I was failing to see the point of encoding
> >> larger numbers
> >>
> >> NB. 5 bytes, 40 bits
> >>     $ #: 255 toVint 1021
> >> 5 8
> >>
> >> Without trying the binary representation, I was expecting the breakup
> >> point to take the same space. For example, below I would have thought
> >> that I'd need to write a
> >>
> >> NB. 6 bytes
> >>    255 toVint 511 toVint 1021
> >> 255 255 1 255 255 0
> >>
> >>
> >> If I understand this correctly, I can see how variable ints could
> >> shave off a few bits if the range of numbers is known in advance,
> >> which is often the case.
> >>
> >>
> >> Regarding packed strings I've been thinking about it recently as well.
> >> I would like to see how it could be used to pack a table of strings
> >> (rows and columns).
> >>
> >> The following string could be a packed representation of a 2x2 table
> >>
> >> FrankUSATomGermany
> >>
> >> The binary format could be:
> >>
> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length]....
> >>
> >> The header for the string could be
> >> 2 2 5 3 3 7
> >>
> >> Or maybe if you wanted to support a variable number of columns (unsure
> >> of the usecase of this)
> >>
> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 length]
> >> 2 2 5 3 2 3 7
> >>
> >> I don't know well it'd perform relative to boxed tables or inverted
> >> tables. I would imagine the memory would be reduced for storage but
> >> I'd have to test to see how much memory would be used on different
> >> operations. I'd be interested in the performance of operations like
> >> searching (e.g. find all the rows with a country of Germany)
> >>
> >> It might also be a safer way of storing user-input text without
> >> worrying about escaping delimiters.
> >>
> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming
> >> <[email protected]> wrote:
> >> > I also provided some adverbs
> >> >
> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]'
> >> > fromVint=: 1 : 'm&~: +/;.2 ]'
> >> >
> >> > these let you set the breakup point to what you wish
> >> >
> >> >   15 toVint 33 12 17
> >> > 15 15 3 12 15 2
> >> >   511 toVint 1233
> >> > 511 511 211
> >> >
> >> >   511 fromVint 511 toVint 1233 444
> >> > 1233 444
> >> > It can provide simple compression.  If you have a list of many strings
> >> where almost all of them will fit into a 6bit alphabet, you can still
> >> encode a full 7 bit or 8bit alphabet.  On average, you'd save space
> using
> >> 6bit alphabet, depending on your context.
> >> >
> >> > The variable string functions I showed, still let you do a substring
> >> search on the packed data.  Part of it is disk and memory compression
> >> (don't expand until you need to, and a string takes much less memory
> than a
> >> list of numbers < 256), but part of it is not being forced in design to
> >> determine what the impossible cutoff number is.
> >> >
> >> > A big advantage related to J is having binary data records.  If you
> >> encode the length of the binary data, you no longer have to box them if
> a
> >> trailing 0 or space would be difficult to know whether it is a fill or
> >> intentional.
> >> >
> >> > Beyond binary data, there is dumb stuff such as storing addresses.
> Most
> >> people have just 1,  Most fit within 255 bytes.  Usually no one cares
> about
> >> address info except when they do or already have the client record.  Do
> you
> >> really want a separate table with 1 to many join with a possible 10000
> >> character fixed size per address just because you're unsure if it will
> ever
> >> be possible for you to store an address to an African village as a set
> of
> >> driving instructions and then a description of the house relative to 3
> >> other houses and landmarks just in case one of the other houses gets
> >> painted.
> >> >
> >> > Another neat feature about the coding method is that if you have a lot
> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at least
> >> compresses well.
> >> >
> >> > A minor thing is that the encoding still lets you do search by < or >.
> >> If you are storing the number of orders per customer.  The first byte
> tells
> >> you whether it is > 254 or less than any other number below 255
> (probably
> >> enough info to select customers of interest).  Another minor thing is
> that
> >> you don't have to use the encoding codes until there is an overflow.  It
> >> may be easier to just switch functions to decode variable data, then to
> >> reshape terrabytes of data.
> >> > ________________________________
> >> > From: Joe Bogner <[email protected]>
> >> > To: [email protected]
> >> > Sent: Tuesday, December 30, 2014 10:45 AM
> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >> >
> >> >
> >> > Pascal,
> >> >
> >> > Thanks for sharing. Can you also share a potential use case?
> >> >
> >> > I had to remind myself about the size of integers (4 bytes) on 32-bit.
> >> >
> >> > Will this only save space on numbers less than 1019?
> >> >
> >> > NB. 4 bytes
> >> > toVint2 1019
> >> > 255 255 255 254
> >> >
> >> > NB. 5 bytes
> >> > toVint2 1020
> >> > 255 255 255 255 0
> >> >
> >> > Are you looking for the savings in terms of disk space (saving it to a
> >> > file) or working memory?
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming
> >> > <[email protected]> wrote:
> >> >> A common design decision is how large to make a field.  For many
> >> numbers, we think that a byte will be large enough to hold practical
> >> values, but one day we will find out that it is not.  This is
> essentially
> >> the Y2k problem, or year 2032 issue.
> >> >>
> >> >> A simple variable integer coding (tied to byte ranges)
> >> >>
> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each
> >> >>
> >> >>   toVint 566 44
> >> >> 255 255 56 44
> >> >>
> >> >> It now takes 3 bytes to store 566, and one byte to store 44.
> >> >>
> >> >> the challenge is to write fromVint such that
> >> >>
> >> >>
> >> >> 566 44 -: fromVint 255 255 56 44
> >> >>
> >> >> as an extra challenge, is there a way to write toVint or fromVint
> such
> >> that it is faster than using intermediate boxing?
> >> >>
> ----------------------------------------------------------------------
> >> >> For information about J forums see
> http://www.jsoftware.com/forums.htm

> >
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to