Alex Rufon wrote:
> A few years back, I've asked this same forum on how to
> convert strings to numbers and back again.
I remember these discussions. Oleg suggested s2i =: 6 s: s: [1], and in a
follow up a few years later, you responded that you'd tried the suggestion, but
it didn't meet your needs because of the memory constraints imposed by your
architecture [2].
> tricky part was going back again from number to the original string.
Yes. If you only needed unique numbers from unique strings, the solution would
be simpler ( 128!:3 'string' is a fun one).
The problem is that if your domain is unbounded (i.e. the input strings are
arbitrary and there are no constraints on their content or length), and you
need a 1:1 mapping, then your range is unbounded too. That is, the strings are
as efficient a representation as you're going to get.
Now, if you're not using numbers to increase efficiency, and you require the
ability to (for example) stitch string-identifiers onto a homogeneous array of
numbers, then you could do something like:
s2i =: (x:#a.) #. a.&i.
s2i 'Alex Rufon'
308953381376021135519598
s2i^:_1 s2i 'Alex Rufon'
Alex Rufon
Essentially, this interprets your strings as numbers in base 256. But as I
said, this only works if it you don't care how big the numbers ouput are (i.e.
if you don't need to limit your range). You can't get a more efficient
representation this way. In fact, the representation is worse (and the mapping
will be slow for large inputs):
a =: 'Alex Rufon'
b =: s2i a
7!:5 ;:'a b'
64 128
So the short answer is, if you require an unbounded domain but a restricted
range, someone's going to have to store your input (to make the mapping
invertible). AFAIK, there are only two ways to do that in J. There's the s:
method Oleg suggested (where J stores the input strings behind the scenes), and
there's doing it yourself:
> LOOKUP_z_=: '';'0';'1';'2';'3';'4';'5';'6';'7';'8';'9';' '
So if you don't like this, we need to know why, and what it is you do want.
You'll need to describe the:
(A) Bounds on your domain. Describe the strings: are they limited in any
way? Are they a fixed length? Is there a maximum length? Is there a limited
universe of characters from which they can be composed? How many do you
process in a batch? Is there a reason you can't use your database to map them
to integers outside of J (e.g. using the [presumably autogenerated integral]
primary key of the table they came from)?
(B) Constraints on your range. Why do you need numbers? Are you just looking
for a more efficient representation of your strings? Do you need to attach the
string (identifiers) to other numbers? Are you trying to avoid boxing (if so,
why)? Must the numbers be positive integers? Can they be negative, float,
rational, complex? Is there a limit to how large they can be? How do you use
them?
(C) Constraints on the mapping (aside from those on the range). Are you still
using the architecture described in [2]? How much memory can a single J
instance use before you start running into performance problems? How long
should a mapping take? Is there a time or memory limit it must not exceed?
(D) Reasons for not liking a lookup array. I presume your application is not
totally functional (in the sense that Jose's applications are totally
functional), and that the extra global noun isn't necessarily a wart. You
brought this up in the context of optimizing your application, so do you find
that LOOKUP_z_ i. y is not fast or lean enough? (If so, since i. is highly
optimized, I don't know if you're going to find a more efficient solution.)
I hope this helps,
-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm