Using the 0.6 snapshot + patch 705 (mongodatamodel) from jira (
https://issues.apache.org/jira/browse/MAHOUT-705), and a test data set with
~300k rows like:

"4cec0a2934ac9fbd2b040000","4d065d5434ac9f5227a12f00",118

It's slowly doing the translations:
INFO: [+++][MONGO-MAP] Adding Translation    Item ID:
4d57d54434ac9fd3570005a2 long_value: 145367

It's doing about 30,000 per hour (and getting slower). That's 8.3/sec.
8G ram, 4 virtual cores

With a test data set of 3M preferences, that would take >5 days, just for
the translation.

Open to ideas/suggestions/"a-ha"-moments. Thanks!




On Tue, May 31, 2011 at 9:15 PM, Ted Dunning <[email protected]> wrote:

> It makes the internals much cleaner to not repeat this conversion.
>
> But how is it that this is taking a long time?  String -> lookup should not
> be much longer than an array access, especially if you use the Mahout
> collections or one of the dictionary types.
>
> On Tue, May 31, 2011 at 7:50 PM, Mike Khristo <[email protected]>
> wrote:
>
> > Rather, how can I use string-based userid/itemid's without having the
> deal
> > with the slowness associated with mapping them to a long?
> >
> > In the MongoDataModel, for example, significant time/overhead goes into
> > converting the unique id's to long...  I'm still getting my head wrapped
> > around mahout, but this seems like a significant limitation. I have to
> > assume there's some logic behind the decision to restrict them to long,
> but
> > i didn't find anything about it in Mahout in Action or the list.
> >
> > Thanks.
> >
>

Reply via email to