I don't think this implementation is going to be practical for any
significant scale, it's more of a toy implementation that reads into
memory. You're welcome to propose a speedup patch if it doesn't break the
semantics. I would not use Mongo this way nor would I probably use the
netflix data set as-is in a non-distributed setup.


On Thu, Nov 15, 2012 at 3:23 PM, Onur Kuru <[email protected]> wrote:

> Hello!
>
> I have exported Netflix data to a mongo db and then tried to build a
> MongoDBDataModel but it is taking too long. As I inspected the
> MongoDBDataModel class I found out that it's making a conversion from
> string to long because mongo uses strings for user_id and item_id, and
> mahout uses long for ids.
>
> MongoDBDataModel stores this conversions in another collection and as it
> iterates over all the documents in the ratings collection, it checks this
> conversion collection whether it assigned a long id to every string id(user
> & item). I think checking/creating a new one(if necessary) in this
> collection becomes a great overhead when the data is too big.
>
> Is there any solution to this which is included in mahout or do I have to
> write my own optimized code?
>
> Regards,
> Onur

Reply via email to