Alex Rufon wrote:
>  A few years back, I've asked this same forum on how to
>  convert strings to numbers and back again. 

I remember these discussions.  Oleg suggested  s2i =:  6 s: s:  [1], and in a 
follow up a few years later, you responded that you'd tried the suggestion, but 
it didn't meet your needs because of the memory constraints imposed by your 
architecture [2].

>  tricky part was going back again from number to the original string.

Yes.  If you only needed unique numbers from unique strings, the solution would 
be simpler (  128!:3 'string'  is a fun one).

The problem is that if your domain is unbounded (i.e. the input strings are 
arbitrary and there are no constraints on their content or length), and you 
need a 1:1 mapping, then your range is unbounded too.  That is, the strings are 
as efficient a representation as you're going to get.  

Now, if you're not using numbers to increase efficiency, and you require the 
ability to (for example) stitch string-identifiers onto a homogeneous array of 
numbers, then you could do something like:

           s2i  =:  (x:#a.) #. a.&i.

           s2i  'Alex Rufon'
        308953381376021135519598

           s2i^:_1 s2i  'Alex Rufon'
        Alex Rufon

Essentially, this interprets your strings as numbers in base 256.  But as I 
said, this only works if it you don't care how big the numbers ouput are (i.e. 
if you don't need to limit your range).  You can't get a more efficient 
representation this way.  In fact, the representation is worse (and the mapping 
will be slow for large inputs):

           a  =:  'Alex Rufon'
           b  =:  s2i a
           
           7!:5 ;:'a b'
        64 128

So the short answer is, if you require an unbounded domain but a restricted 
range, someone's going to have to store your input (to make the mapping 
invertible).  AFAIK, there are only two ways to do that in J.  There's the  s:  
method Oleg suggested (where J stores the input strings behind the scenes), and 
 there's doing it yourself:

>  LOOKUP_z_=: '';'0';'1';'2';'3';'4';'5';'6';'7';'8';'9';' '

So if you don't like this, we need to know why, and what it is you do want.  
You'll need to describe the:

(A)  Bounds on your domain.  Describe the strings:  are they limited in any 
way?  Are they a fixed length?  Is there a maximum length?  Is there a limited 
universe of characters from which they can be composed?  How many do you 
process in a batch?   Is there a reason you can't use your database to map them 
to integers outside of J (e.g. using the [presumably autogenerated integral] 
primary key of the table they came from)?

(B)  Constraints on your range.  Why do you need numbers?  Are you just looking 
for a more efficient representation of your strings?  Do you need to attach the 
string (identifiers) to other numbers?  Are you trying to avoid boxing (if so, 
why)?  Must the numbers be positive integers?  Can they be negative, float, 
rational, complex?  Is there a limit to how large they can be?  How do you use 
them?  

(C)  Constraints on the mapping (aside from those on the range).  Are you still 
using the architecture described in [2]?  How much memory can a single J 
instance use before you start running into performance problems?  How long 
should a mapping take?  Is there a time or memory limit it must not exceed?

(D)  Reasons for not liking a lookup array.  I presume your application is not 
totally functional (in the sense that Jose's applications are totally 
functional), and that the extra global noun isn't necessarily a wart.  You 
brought this up in the context of optimizing your application, so do you find 
that  LOOKUP_z_ i. y  is not fast or lean enough?  (If so, since  i. is highly 
optimized, I don't know if you're going to find a more efficient solution.)

I hope this helps,

-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to