You don't have to use these numeric IDs elsewhere in your system. For example if you have an additional column with a unique numeric ID then this ought to work fine, you can just have it reference that column while you use your real key elsewhere.
That is you can map to/from numeric IDs only for these purposes -- that's all IDMigrator does anyway, in memory. You could accomplish it elsewhere if it's easier. On Thu, Aug 11, 2011 at 11:27 PM, Charles McBrearty <[email protected]>wrote: > If using Strings internally as ID's costs too much from a performance > perspective that's totally fine and I wasn't trying to pick that fight. It > sounds like there isn't much appetite for String wrappers however. > > In any event, your suggestion to switch to numeric IDs is a non-starter. > This is because re-key'ing the tables in the database system I'm using > would break all the other jobs running against said tables. > > -chuck > > On Aug 10, 2011, at 11:34 PM, Sean Owen wrote: > > > Yes, it's just that it's much slower and takes up much more memory. You > are > > strongly encouraged to use numeric IDs and not bother with this adapter > at > > all. It's not a question of interning strings, and they need not be > > consecutive IDs, but avoiding them entirely. > > > > On Thu, Aug 11, 2011 at 1:02 AM, Charles McBrearty <[email protected]> > wrote: > > > >> Hi, > >> > >> I am taking a look at running some of the recommender examples from > Mahout > >> in action on a data set that I have that uses strings as the ItemID's > and it > >> looks to me like the suggested way to do this is to subclass > FileDataModel > >> and then use FileIdMigrator to manage the String <-> Long mapping. > >> > >> This seems like a lot of complication to deal with what I would imagine > is > >> a pretty common use case. Is there something that I'm missing here? > >> > >> Thanks for any info that anyone can provide. > >> > >> -chuck > >
