Jon Hough wrote:
>  I understand this function somewhat, but cannot figure out why _6[\ is 
> necessary.
>
>     1.  read y and convert to ASCII int value
>     2.  write in binary
>     3.  , (ravel) concatenate into one array
>     4.  Do the mysterious _6[\
>     5.  {~ #. encodes the result to the BASE64 alphabet.
>
>  The biggest mystery to me is step 4.

In general, the pattern  (-n) ]\ vector  breaks  vector  into
non-overlapping groups of width  n  (in  n ]\ vector  where  n  is
positive, the groups overlap).  For example:

           _3 <\ 1 2 3 4 5 6 7 8 9 10 11 12
        +-----+-----+-----+--------+
        |1 2 3|4 5 6|7 8 9|10 11 12|
        +-----+-----+-----+--------+
           _4 <\ 1 2 3 4 5 6 7 8 9 10 11 12
        +-------+-------+----------+
        |1 2 3 4|5 6 7 8|9 10 11 12|
        +-------+-------+----------+
           _5 <\ 1 2 3 4 5 6 7 8 9 10 11 12
        +---------+----------+-----+
        |1 2 3 4 5|6 7 8 9 10|11 12|
        +---------+----------+-----+

           _3 ]\ 1 2 3 4 5 6 7 8 9 10 11 12
         1  2  3
         4  5  6
         7  8  9
        10 11 12
           _4 ]\ 1 2 3 4 5 6 7 8 9 10 11 12
        1  2  3  4
        5  6  7  8
        9 10 11 12
           _5 ]\ 1 2 3 4 5 6 7 8 9 10 11 12
         1  2 3 4  5
         6  7 8 9 10
        11 12 0 0  0

So, based on your (correct) assessment, after step 3 we have a single
concatenated list of the binary representation of y.  That is, we have the
bit vector representing y .  

So step 4,  _6 ]\ bit_vector  breaks the binary representation of y into
6-bit chunks.  Why 6 bits?  

           2^6
        64
           'Base',":2^6
        Base64

BTW, did you notice that a. {~ ... is the inverse of  a. i. ... ?  How
about that #. is the inverse of  #: ?  Hint, hint.

-Dan

PS: Jon also wrote:
>  aside - how does that handle chars larger than one byte?, i.e. UTF

J has to types of "strings": 8-bit and 16-bit . The 16-bit version is
little used (relatively speaking), and Pascal's code doesn't handle it, so
I'll ignore it here.  The 8-bit version has two interpretations: when all
bytes values are < 128, then it's pure, correct ASCII.  When it contains
byte values > 128, it's best to think of the string as an "opaque byte
stream" which could represent anything.  It might even be valid UTF8. 

But that's irrelevant for the purposes of encoding: base64 doesn't know or
care. The encoder will accept any old stream of bytes (vector of 8 bit
values) you pass to it, regardless of how you think about or view that
those bytes from an application perspective.


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to