Yes, I agree, you need to deal with the "sign bit" somehow. There's other ways of doing that, though, which might be faster.
For example: b32int=: [: ((<.2^32)&|)&.(+&(<.2^31))2#.] b32int (32#2)#: _17998321 _17998321 I think that'll use a floating point representation on a 32 bit machine, but that's easily fixed with another <. Thanks, -- Raul On Tue, Jan 6, 2015 at 2:48 PM, 'Pascal Jasmin' via Programming <[email protected]> wrote: > thank you Raul, I think this "decryption" function is still needed though? > > -&4294967296^:(>&2147483648) #. (32#2)#: _17998321 > _17998321 > -&4294967296^:(>&2147483648) #. (32#2)#: 17998321 > 17998321 > > > > > > ----- Original Message ----- > From: Raul Miller <[email protected]> > To: Programming forum <[email protected]> > Cc: > Sent: Tuesday, January 6, 2015 1:43 PM > Subject: Re: [Jprogramming] variable integer coding (challenge) > > The &.|. in ($!.1)&.|. is so the 1s you are padding with appear on the > left, rather than on the right. > > Another way of accomplishing that would be like this: > _32 {.!.1 #:-17998321 > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > But note that there's a problem here (with both of these approaches): > 32 ($!.1)&.|. 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > ...padding on the left with 1s is only valid for negative numbers. > > Meanwhile, > #: 256 #. |. a.i. 2 ic _17998321 > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > ... so the |. which appears in that expression works the same way on > probably any intel machine and would probably be relevant in most > contexts (though I've not tested on ARM processor, and I'm not even > sure if current versions of J run on ARM... and since ARM is > "bi-endian" I'm not sure what "native" would mean in that context - > but I guess it would make sense for ARM handling to mimic intel > handling). > > But there's an even simpler approach: > (32#2)#:_17998321 > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > Simplicity is a virtue? > > Thanks, > > -- > Raul > > On Tue, Jan 6, 2015 at 12:44 PM, 'Pascal Jasmin' via Programming > <[email protected]> wrote: >> I'm not sure this is the same problem, unless you are trying to encode >> within a range of lattitudes that include dense civilization, and longitudes >> that exclude oceans. >> >> but the 32bit binary rep of -17998321 >> >> 32 ($!.1)&.|. #: -17998321 >> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> >> though the "memcopy" version is simpler. >> >> #: 256 #. |. a.i. 2 ic _17998321 >> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> >> -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321 >> _17998321 >> -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321 >> 17998321 >> >> >> I'm not sure if the |. step is only for non macs. >> >> >> >> ----- Original Message ----- >> From: Vijay Lulla <[email protected]> >> To: [email protected] >> Cc: >> Sent: Tuesday, January 6, 2015 11:38 AM >> Subject: Re: [Jprogramming] variable integer coding (challenge) >> >> I have followed this thread with interest. Much of this discussion is >> relevant to >> https://developers.google.com/maps/documentation/utilities/polylinealgorithm >> . Like Joe, I also enjoyed playing at bits level and was impressed by how >> easy J makes such explorations. Unfortunately, I couldn't generate all the >> encoded values in the example table listed on the page. Maybe others in >> this group might enjoy exploring this algorithm. >> >> Thanks for the great discussion. >> >> On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote: >> >>> Thanks. I couldn't follow the examples so I worked through it on my >>> own. Sharing it here for anyone else who's following. I switched to 9 >>> bits instead of 19 to simplify the example >>> >>> NB. 300 needs 9 bits to be expressed >>> $ #: 300 >>> 9 >>> >>> NB. Show 4 9 bit numbers >>> $ #: 300 301 302 303 >>> 4 9 >>> >>> Writing those 4 numbers would take 16 bytes[1] since J ints are 4 >>> bytes as well as C >>> >>> We can compress it down to 5 bytes if we pack all the bits together: >>> >>> ] packed=. #. _8[\ , #: 300 301 302 303 >>> 150 75 101 210 240 >>> >>> $ packed >>> 5 >>> >>> Those 5 bytes could be written to a file and re-read >>> >>> If I knew in advance that those numbers were 9 bits, it can be reversed >>> >>> #. _9[\, #: packed >>> 300 301 302 303 0 >>> >>> We could also pack it into 8 bytes (two 32 bit integers), but I don't >>> see an advantage of that over packing into a single byte (8 bit) >>> >>> ] packed=. #. _32[\ , #: 300 301 302 303 >>> 2.52152e9 4.02653e9 >>> >>> #. _9[\, #: packed >>> 300 301 302 303 0 0 0 0 >>> >>> >>> This may be elementary for some but I haven't worked at this level in >>> years so I enjoyed playing with it >>> >>> >>> >>> >>> [1] - http://www.jsoftware.com/help/learning/27.htm >>> >>> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming >>> <[email protected]> wrote: >>> > Also, if you have only 4 19 bit values, you can pack them up tighter >>> > >>> > _2 (16&#.)\ #. |: ?. 4 19 $ 2 >>> > 109 59 161 30 136 80 64 129 56 1 >>> > >>> > Non multiples of 8 would just generally leave some trailing 0s on >>> decode, which could simply be understood as invalid. >>> > >>> > >>> > ----- Original Message ----- >>> > From: 'Pascal Jasmin' via Programming <[email protected]> >>> > To: "[email protected]" <[email protected]> >>> > Cc: >>> > Sent: Wednesday, December 31, 2014 11:40 AM >>> > Subject: Re: [Jprogramming] variable integer coding (challenge) >>> > >>> > base64 is essentially taking 3 bytes and storing them as 4. The inverse >>> can take 4 6bit values and pack them into 3 bytes. The J code for base64 >>> is pretty generic to see how to do it for any base. 7 bit values could be >>> packed 8 at a time in 7 bytes. >>> > >>> > There is a different way to pack 8 x bit values into x bytes. Each byte >>> would store the i'th bit of the 8 values. This may be even faster than >>> base64 type coding. >>> > >>> > #. |: ?. 8 19 $ 2 >>> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28 >>> > >>> > >>> > ----- Original Message ----- >>> > From: Joe Bogner <[email protected]> >>> > To: [email protected] >>> > Cc: >>> > Sent: Wednesday, December 31, 2014 10:02 AM >>> > Subject: Re: [Jprogramming] variable integer coding (challenge) >>> > >>> > Can 19 bits be stored in memory or a file on modern computers without >>> > taking up 3 bytes (24 bits)? The only idea that came to mind would some >>> bit >>> > packing scheme with multiple items. That seems complicated unless there >>> was >>> > a need to store billions of these numbers >>> > >>> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote: >>> > >>> >> Thanks. I missed the point of the variable breakup point but exploring >>> >> different options below illustrated it for me. >>> >> >>> >> Changing the breakup point can encode the numbers into different bit >>> >> lengths (other than 8) which can then be written to the file/memory. >>> >> >>> >> For example, 1021 encoded in 511 breakups would take 18 bits >>> >> >>> >> $ #: 511 toVint 1021 >>> >> 2 9 >>> >> >>> >> Whereas encoding it as 255 would require 40 bits, which is more than a >>> >> 32-bit int. This is where I was failing to see the point of encoding >>> >> larger numbers >>> >> >>> >> NB. 5 bytes, 40 bits >>> >> $ #: 255 toVint 1021 >>> >> 5 8 >>> >> >>> >> Without trying the binary representation, I was expecting the breakup >>> >> point to take the same space. For example, below I would have thought >>> >> that I'd need to write a >>> >> >>> >> NB. 6 bytes >>> >> 255 toVint 511 toVint 1021 >>> >> 255 255 1 255 255 0 >>> >> >>> >> >>> >> If I understand this correctly, I can see how variable ints could >>> >> shave off a few bits if the range of numbers is known in advance, >>> >> which is often the case. >>> >> >>> >> >>> >> Regarding packed strings I've been thinking about it recently as well. >>> >> I would like to see how it could be used to pack a table of strings >>> >> (rows and columns). >>> >> >>> >> The following string could be a packed representation of a 2x2 table >>> >> >>> >> FrankUSATomGermany >>> >> >>> >> The binary format could be: >>> >> >>> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length].... >>> >> >>> >> The header for the string could be >>> >> 2 2 5 3 3 7 >>> >> >>> >> Or maybe if you wanted to support a variable number of columns (unsure >>> >> of the usecase of this) >>> >> >>> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 length] >>> >> 2 2 5 3 2 3 7 >>> >> >>> >> I don't know well it'd perform relative to boxed tables or inverted >>> >> tables. I would imagine the memory would be reduced for storage but >>> >> I'd have to test to see how much memory would be used on different >>> >> operations. I'd be interested in the performance of operations like >>> >> searching (e.g. find all the rows with a country of Germany) >>> >> >>> >> It might also be a safer way of storing user-input text without >>> >> worrying about escaping delimiters. >>> >> >>> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming >>> >> <[email protected]> wrote: >>> >> > I also provided some adverbs >>> >> > >>> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]' >>> >> > fromVint=: 1 : 'm&~: +/;.2 ]' >>> >> > >>> >> > these let you set the breakup point to what you wish >>> >> > >>> >> > 15 toVint 33 12 17 >>> >> > 15 15 3 12 15 2 >>> >> > 511 toVint 1233 >>> >> > 511 511 211 >>> >> > >>> >> > 511 fromVint 511 toVint 1233 444 >>> >> > 1233 444 >>> >> > It can provide simple compression. If you have a list of many strings >>> >> where almost all of them will fit into a 6bit alphabet, you can still >>> >> encode a full 7 bit or 8bit alphabet. On average, you'd save space >>> using >>> >> 6bit alphabet, depending on your context. >>> >> > >>> >> > The variable string functions I showed, still let you do a substring >>> >> search on the packed data. Part of it is disk and memory compression >>> >> (don't expand until you need to, and a string takes much less memory >>> than a >>> >> list of numbers < 256), but part of it is not being forced in design to >>> >> determine what the impossible cutoff number is. >>> >> > >>> >> > A big advantage related to J is having binary data records. If you >>> >> encode the length of the binary data, you no longer have to box them if >>> a >>> >> trailing 0 or space would be difficult to know whether it is a fill or >>> >> intentional. >>> >> > >>> >> > Beyond binary data, there is dumb stuff such as storing addresses. >>> Most >>> >> people have just 1, Most fit within 255 bytes. Usually no one cares >>> about >>> >> address info except when they do or already have the client record. Do >>> you >>> >> really want a separate table with 1 to many join with a possible 10000 >>> >> character fixed size per address just because you're unsure if it will >>> ever >>> >> be possible for you to store an address to an African village as a set >>> of >>> >> driving instructions and then a description of the house relative to 3 >>> >> other houses and landmarks just in case one of the other houses gets >>> >> painted. >>> >> > >>> >> > Another neat feature about the coding method is that if you have a lot >>> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at least >>> >> compresses well. >>> >> > >>> >> > A minor thing is that the encoding still lets you do search by < or >. >>> >> If you are storing the number of orders per customer. The first byte >>> tells >>> >> you whether it is > 254 or less than any other number below 255 >>> (probably >>> >> enough info to select customers of interest). Another minor thing is >>> that >>> >> you don't have to use the encoding codes until there is an overflow. It >>> >> may be easier to just switch functions to decode variable data, then to >>> >> reshape terrabytes of data. >>> >> > ________________________________ >>> >> > From: Joe Bogner <[email protected]> >>> >> > To: [email protected] >>> >> > Sent: Tuesday, December 30, 2014 10:45 AM >>> >> > Subject: Re: [Jprogramming] variable integer coding (challenge) >>> >> > >>> >> > >>> >> > Pascal, >>> >> > >>> >> > Thanks for sharing. Can you also share a potential use case? >>> >> > >>> >> > I had to remind myself about the size of integers (4 bytes) on 32-bit. >>> >> > >>> >> > Will this only save space on numbers less than 1019? >>> >> > >>> >> > NB. 4 bytes >>> >> > toVint2 1019 >>> >> > 255 255 255 254 >>> >> > >>> >> > NB. 5 bytes >>> >> > toVint2 1020 >>> >> > 255 255 255 255 0 >>> >> > >>> >> > Are you looking for the savings in terms of disk space (saving it to a >>> >> > file) or working memory? >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming >>> >> > <[email protected]> wrote: >>> >> >> A common design decision is how large to make a field. For many >>> >> numbers, we think that a byte will be large enough to hold practical >>> >> values, but one day we will find out that it is not. This is >>> essentially >>> >> the Y2k problem, or year 2032 issue. >>> >> >> >>> >> >> A simple variable integer coding (tied to byte ranges) >>> >> >> >>> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each >>> >> >> >>> >> >> toVint 566 44 >>> >> >> 255 255 56 44 >>> >> >> >>> >> >> It now takes 3 bytes to store 566, and one byte to store 44. >>> >> >> >>> >> >> the challenge is to write fromVint such that >>> >> >> >>> >> >> >>> >> >> 566 44 -: fromVint 255 255 56 44 >>> >> >> >>> >> >> as an extra challenge, is there a way to write toVint or fromVint >>> such >>> >> that it is faster than using intermediate boxing? >>> >> >> >>> ---------------------------------------------------------------------- >>> >> >> For information about J forums see >>> http://www.jsoftware.com/forums.htm > >> >>> > >>> >> > ---------------------------------------------------------------------- >>> >> > For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> >> > ---------------------------------------------------------------------- >>> >> > For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> >> >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
