It's been a while since I've played with great circle distances, but
you are right that you want that to be simple to understand.

Thanks,

-- 
Raul

On Tue, Jan 6, 2015 at 2:45 PM, Vijay Lulla <[email protected]> wrote:
> Actually I used (32$2)#:_17998321 . I agree about the point of simplicity
> but there are other reasons for using a simpler approach too.  I didn't
> consider using padding/fills because calculations involving
> latitudes/longitudes have peculiarities just like date/time
> representations.  For e.g., calculating distance between points in
> east/west and north/south hemispheres (
> http://workshops.boundlessgeo.com/postgis-intro/geography.html).  This
> comes up often in GIS related calculations, especially when every mapping
> webservice is using lat/lon to store geographic data. And hence, I try to
> find the simplest approach (human centric) even if it might not be the most
> efficient (machine/task centric).
>
> On Tue, Jan 6, 2015 at 1:43 PM, Raul Miller <[email protected]> wrote:
>
>> The &.|. in ($!.1)&.|. is so the 1s you are padding with appear on the
>> left, rather than on the right.
>>
>> Another way of accomplishing that would be like this:
>>    _32 {.!.1 #:-17998321
>> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>>
>> But note that there's a problem here (with both of these approaches):
>>    32 ($!.1)&.|. 1
>> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>
>> ...padding on the left with 1s is only valid for negative numbers.
>>
>> Meanwhile,
>>    #: 256 #. |. a.i. 2 ic _17998321
>> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>>
>> ... so the |. which appears in that expression works the same way on
>> probably any intel machine and would probably be relevant in most
>> contexts (though I've not tested on ARM processor, and I'm not even
>> sure if current versions of J run on ARM... and since ARM is
>> "bi-endian" I'm not sure what "native" would mean in that context -
>> but I guess it would make sense for ARM handling to mimic intel
>> handling).
>>
>> But there's an even simpler approach:
>>    (32#2)#:_17998321
>> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>>
>> Simplicity is a virtue?
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Tue, Jan 6, 2015 at 12:44 PM, 'Pascal Jasmin' via Programming
>> <[email protected]> wrote:
>> > I'm not sure this is the same problem, unless you are trying to encode
>> within a range of lattitudes that include dense civilization, and
>> longitudes that exclude oceans.
>> >
>> > but the 32bit binary rep of -17998321
>> >
>> > 32 ($!.1)&.|. #: -17998321
>> > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>> >
>> > though the "memcopy" version is simpler.
>> >
>> >   #: 256 #. |. a.i. 2 ic _17998321
>> > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>> >
>> >   -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321
>> > _17998321
>> >   -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321
>> > 17998321
>> >
>> >
>> >  I'm not sure if the |. step is only for non macs.
>> >
>> >
>> >
>> > ----- Original Message -----
>> > From: Vijay Lulla <[email protected]>
>> > To: [email protected]
>> > Cc:
>> > Sent: Tuesday, January 6, 2015 11:38 AM
>> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>> >
>> > I have followed this thread with interest.  Much of this discussion is
>> > relevant to
>> >
>> https://developers.google.com/maps/documentation/utilities/polylinealgorithm
>> > . Like Joe, I also enjoyed playing at bits level and was impressed by how
>> > easy J makes such explorations.  Unfortunately, I couldn't generate all
>> the
>> > encoded values in the example table listed on the page.  Maybe others in
>> > this group might enjoy exploring this algorithm.
>> >
>> > Thanks for the great discussion.
>> >
>> > On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote:
>> >
>> >> Thanks. I couldn't follow the examples so I worked through it on my
>> >> own. Sharing it here for anyone else who's following. I switched to 9
>> >> bits instead of 19 to simplify the example
>> >>
>> >>    NB. 300 needs 9 bits to be expressed
>> >>    $ #: 300
>> >> 9
>> >>
>> >>    NB. Show 4 9 bit numbers
>> >>    $ #: 300 301 302 303
>> >> 4 9
>> >>
>> >> Writing those 4 numbers would take 16 bytes[1] since J ints are 4
>> >> bytes as well as C
>> >>
>> >> We can compress it down to 5 bytes if we pack all the bits together:
>> >>
>> >>    ] packed=. #. _8[\ , #: 300 301 302 303
>> >> 150 75 101 210 240
>> >>
>> >>    $ packed
>> >> 5
>> >>
>> >> Those 5 bytes could be written to a file and re-read
>> >>
>> >> If I knew in advance that those numbers were 9 bits, it can be reversed
>> >>
>> >>    #.  _9[\, #: packed
>> >> 300 301 302 303 0
>> >>
>> >> We could also pack it into 8 bytes (two 32 bit integers), but I don't
>> >> see an advantage of that over packing into a single byte (8 bit)
>> >>
>> >>    ] packed=. #. _32[\ , #: 300 301 302 303
>> >> 2.52152e9 4.02653e9
>> >>
>> >>    #.  _9[\, #: packed
>> >> 300 301 302 303 0 0 0 0
>> >>
>> >>
>> >> This may be elementary for some but I haven't worked at this level in
>> >> years so I enjoyed playing with it
>> >>
>> >>
>> >>
>> >>
>> >> [1] - http://www.jsoftware.com/help/learning/27.htm
>> >>
>> >> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming
>> >> <[email protected]> wrote:
>> >> > Also, if you have only 4 19 bit values, you can pack them up tighter
>> >> >
>> >> > _2 (16&#.)\ #. |: ?. 4 19 $ 2
>> >> > 109 59 161 30 136 80 64 129 56 1
>> >> >
>> >> > Non multiples of 8 would just generally leave some trailing 0s on
>> >> decode, which could simply be understood as invalid.
>> >> >
>> >> >
>> >> > ----- Original Message -----
>> >> > From: 'Pascal Jasmin' via Programming <[email protected]>
>> >> > To: "[email protected]" <[email protected]>
>> >> > Cc:
>> >> > Sent: Wednesday, December 31, 2014 11:40 AM
>> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>> >> >
>> >> > base64 is essentially taking 3 bytes and storing them as 4.  The
>> inverse
>> >> can take 4 6bit values and pack them into 3 bytes.  The J code for
>> base64
>> >> is pretty generic to see how to do it for any base.  7 bit values could
>> be
>> >> packed 8 at a time in 7 bytes.
>> >> >
>> >> > There is a different way to pack 8 x bit values into x bytes.  Each
>> byte
>> >> would store the i'th bit of the 8 values.  This may be even faster than
>> >> base64 type coding.
>> >> >
>> >> >   #. |: ?. 8 19 $ 2
>> >> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28
>> >> >
>> >> >
>> >> > ----- Original Message -----
>> >> > From: Joe Bogner <[email protected]>
>> >> > To: [email protected]
>> >> > Cc:
>> >> > Sent: Wednesday, December 31, 2014 10:02 AM
>> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>> >> >
>> >> > Can 19 bits be stored in memory or a file on modern computers without
>> >> > taking up 3 bytes (24 bits)? The only idea that came to mind would
>> some
>> >> bit
>> >> > packing scheme with multiple items. That seems complicated unless
>> there
>> >> was
>> >> > a need to store billions of these numbers
>> >> >
>> >> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote:
>> >> >
>> >> >> Thanks. I missed the point of the variable breakup point but
>> exploring
>> >> >> different options below illustrated it for me.
>> >> >>
>> >> >> Changing the breakup point can encode the numbers into different bit
>> >> >> lengths (other than 8) which can then be written to the file/memory.
>> >> >>
>> >> >> For example, 1021 encoded in 511 breakups would take 18 bits
>> >> >>
>> >> >>    $ #: 511 toVint 1021
>> >> >> 2 9
>> >> >>
>> >> >> Whereas encoding it as 255 would require 40 bits, which is more than
>> a
>> >> >> 32-bit int. This is where I was failing to see the point of encoding
>> >> >> larger numbers
>> >> >>
>> >> >> NB. 5 bytes, 40 bits
>> >> >>     $ #: 255 toVint 1021
>> >> >> 5 8
>> >> >>
>> >> >> Without trying the binary representation, I was expecting the breakup
>> >> >> point to take the same space. For example, below I would have thought
>> >> >> that I'd need to write a
>> >> >>
>> >> >> NB. 6 bytes
>> >> >>    255 toVint 511 toVint 1021
>> >> >> 255 255 1 255 255 0
>> >> >>
>> >> >>
>> >> >> If I understand this correctly, I can see how variable ints could
>> >> >> shave off a few bits if the range of numbers is known in advance,
>> >> >> which is often the case.
>> >> >>
>> >> >>
>> >> >> Regarding packed strings I've been thinking about it recently as
>> well.
>> >> >> I would like to see how it could be used to pack a table of strings
>> >> >> (rows and columns).
>> >> >>
>> >> >> The following string could be a packed representation of a 2x2 table
>> >> >>
>> >> >> FrankUSATomGermany
>> >> >>
>> >> >> The binary format could be:
>> >> >>
>> >> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length]....
>> >> >>
>> >> >> The header for the string could be
>> >> >> 2 2 5 3 3 7
>> >> >>
>> >> >> Or maybe if you wanted to support a variable number of columns
>> (unsure
>> >> >> of the usecase of this)
>> >> >>
>> >> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2
>> length]
>> >> >> 2 2 5 3 2 3 7
>> >> >>
>> >> >> I don't know well it'd perform relative to boxed tables or inverted
>> >> >> tables. I would imagine the memory would be reduced for storage but
>> >> >> I'd have to test to see how much memory would be used on different
>> >> >> operations. I'd be interested in the performance of operations like
>> >> >> searching (e.g. find all the rows with a country of Germany)
>> >> >>
>> >> >> It might also be a safer way of storing user-input text without
>> >> >> worrying about escaping delimiters.
>> >> >>
>> >> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming
>> >> >> <[email protected]> wrote:
>> >> >> > I also provided some adverbs
>> >> >> >
>> >> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]'
>> >> >> > fromVint=: 1 : 'm&~: +/;.2 ]'
>> >> >> >
>> >> >> > these let you set the breakup point to what you wish
>> >> >> >
>> >> >> >   15 toVint 33 12 17
>> >> >> > 15 15 3 12 15 2
>> >> >> >   511 toVint 1233
>> >> >> > 511 511 211
>> >> >> >
>> >> >> >   511 fromVint 511 toVint 1233 444
>> >> >> > 1233 444
>> >> >> > It can provide simple compression.  If you have a list of many
>> strings
>> >> >> where almost all of them will fit into a 6bit alphabet, you can still
>> >> >> encode a full 7 bit or 8bit alphabet.  On average, you'd save space
>> >> using
>> >> >> 6bit alphabet, depending on your context.
>> >> >> >
>> >> >> > The variable string functions I showed, still let you do a
>> substring
>> >> >> search on the packed data.  Part of it is disk and memory compression
>> >> >> (don't expand until you need to, and a string takes much less memory
>> >> than a
>> >> >> list of numbers < 256), but part of it is not being forced in design
>> to
>> >> >> determine what the impossible cutoff number is.
>> >> >> >
>> >> >> > A big advantage related to J is having binary data records.  If you
>> >> >> encode the length of the binary data, you no longer have to box them
>> if
>> >> a
>> >> >> trailing 0 or space would be difficult to know whether it is a fill
>> or
>> >> >> intentional.
>> >> >> >
>> >> >> > Beyond binary data, there is dumb stuff such as storing addresses.
>> >> Most
>> >> >> people have just 1,  Most fit within 255 bytes.  Usually no one cares
>> >> about
>> >> >> address info except when they do or already have the client record.
>> Do
>> >> you
>> >> >> really want a separate table with 1 to many join with a possible
>> 10000
>> >> >> character fixed size per address just because you're unsure if it
>> will
>> >> ever
>> >> >> be possible for you to store an address to an African village as a
>> set
>> >> of
>> >> >> driving instructions and then a description of the house relative to
>> 3
>> >> >> other houses and landmarks just in case one of the other houses gets
>> >> >> painted.
>> >> >> >
>> >> >> > Another neat feature about the coding method is that if you have a
>> lot
>> >> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at
>> least
>> >> >> compresses well.
>> >> >> >
>> >> >> > A minor thing is that the encoding still lets you do search by <
>> or >.
>> >> >> If you are storing the number of orders per customer.  The first byte
>> >> tells
>> >> >> you whether it is > 254 or less than any other number below 255
>> >> (probably
>> >> >> enough info to select customers of interest).  Another minor thing is
>> >> that
>> >> >> you don't have to use the encoding codes until there is an
>> overflow.  It
>> >> >> may be easier to just switch functions to decode variable data, then
>> to
>> >> >> reshape terrabytes of data.
>> >> >> > ________________________________
>> >> >> > From: Joe Bogner <[email protected]>
>> >> >> > To: [email protected]
>> >> >> > Sent: Tuesday, December 30, 2014 10:45 AM
>> >> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
>> >> >> >
>> >> >> >
>> >> >> > Pascal,
>> >> >> >
>> >> >> > Thanks for sharing. Can you also share a potential use case?
>> >> >> >
>> >> >> > I had to remind myself about the size of integers (4 bytes) on
>> 32-bit.
>> >> >> >
>> >> >> > Will this only save space on numbers less than 1019?
>> >> >> >
>> >> >> > NB. 4 bytes
>> >> >> > toVint2 1019
>> >> >> > 255 255 255 254
>> >> >> >
>> >> >> > NB. 5 bytes
>> >> >> > toVint2 1020
>> >> >> > 255 255 255 255 0
>> >> >> >
>> >> >> > Are you looking for the savings in terms of disk space (saving it
>> to a
>> >> >> > file) or working memory?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming
>> >> >> > <[email protected]> wrote:
>> >> >> >> A common design decision is how large to make a field.  For many
>> >> >> numbers, we think that a byte will be large enough to hold practical
>> >> >> values, but one day we will find out that it is not.  This is
>> >> essentially
>> >> >> the Y2k problem, or year 2032 issue.
>> >> >> >>
>> >> >> >> A simple variable integer coding (tied to byte ranges)
>> >> >> >>
>> >> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each
>> >> >> >>
>> >> >> >>   toVint 566 44
>> >> >> >> 255 255 56 44
>> >> >> >>
>> >> >> >> It now takes 3 bytes to store 566, and one byte to store 44.
>> >> >> >>
>> >> >> >> the challenge is to write fromVint such that
>> >> >> >>
>> >> >> >>
>> >> >> >> 566 44 -: fromVint 255 255 56 44
>> >> >> >>
>> >> >> >> as an extra challenge, is there a way to write toVint or fromVint
>> >> such
>> >> >> that it is faster than using intermediate boxing?
>> >> >> >>
>> >> ----------------------------------------------------------------------
>> >> >> >> For information about J forums see
>> >> http://www.jsoftware.com/forums.htm
>> >
>> >> >
>> >> >> >
>> ----------------------------------------------------------------------
>> >> >> > For information about J forums see
>> >> http://www.jsoftware.com/forums.htm
>> >> >> >
>> ----------------------------------------------------------------------
>> >> >> > For information about J forums see
>> >> http://www.jsoftware.com/forums.htm
>> >> >>
>> >> > ----------------------------------------------------------------------
>> >> > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> >> > ----------------------------------------------------------------------
>> >> > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> >> > ----------------------------------------------------------------------
>> >> > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to