Say I have collection A cotainning the data to be trained. Each doc is like

{_id : ObjectId(...), userid : ObjectId(...),itemid :
ObjectId(...),value:1,create_at:14123456 }

Should I create indrx on fields other than the default _id?
2015年2月12日 上午3:52于 "Pat Ferrel" <p...@occamsmachete.com>写道:

> Do you understand the requirement for Mahout IDs?
>
> Still don’t understand your index question. Mongo *can* store the Mahout
> IDs and index them. In this case you would have a Mongo ObjectId,  your own
> application specific ID (catalog number, username, etc), and the Mahout ID
> (0..n). You could lookup by any of these.
>
> On Feb 11, 2015, at 11:30 AM, 黄雅冠 <huangy...@gmail.com> wrote:
>
> Thanks for the reply.
>
> Yes, I am using in memory version because of the learning curve for a
> biginner.
>
> The index I have memtion is mongo index on collection. Will it quicker if I
> ensure some index before trainning?
>
> I use maven to manage project. Does 1.0 accessiabe via maven. Does it a
> beta version or a stable one?
> 2015年2月12日 上午3:05于 "Pat Ferrel" <p...@occamsmachete.com>写道:
>
> > You are using the in-memory recommender (not Hadoop version)? Note that
> > this may not scale well.
> >
> > The in-memory and Hadoop versions of the recommender *require* user and
> > item IDs to be non-negative contiguous integers. You must map your IDs to
> > Mahout-IDs and back again. Inside Mahout *only* Mahout-IDs are used.
> >
> > Not sure what you are asking about “indexes”
> >
> > BTW the new Spark-Mahout v1.0 snapshot version of the recommender has no
> > such restriction on user and item IDs. See a description here:
> > http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
> > It is much easier to use with MongoDB especially if you index certain
> > document fields with Solr, which it requires to deliver recommendations.
> >
> > On Feb 11, 2015, at 6:15 AM, 黄雅冠 <huangy...@gmail.com> wrote:
> >
> > Hi !
> >
> > I am using mahout item-based recommendation with mongodb. I play around
> > with it and have serval questions.
> >
> >  -
> >
> >  How to persistent the recommend model from memory to disk? I know it is
> >  an old question and there already exists several discussions, such as
> > this
> >  one
> >  <
> >
> http://mail-archives.apache.org/mod_mbox/mahout-user/201112.mbox/%3ccanq80da42nfr8p5mt-qnbo-ycaxyfrbskyoefairdzyrdy-...@mail.gmail.com%3E
> >>
> > .
> >  The result come out I have to do it myself. I just wondering is there
> any
> >  realization after two years?
> >  -
> >
> >  Is it better to set index in the collection ( the one provides
> >  preference data )? I read the source and find some query on the
> > collection,
> >  such as (user_id, item_id), (user_id), (item_id). Also when refresh
> >  called, it will scan the whole collection to find the new data, so
> >  (create_at). Would I benefit from ensure index on the fields? If yes,
> >  which indexes should I ensure?
> >  -
> >
> >  From what I can understand, I can use refreshData to achieve event
> >  driven fresh. That is, when an event ( user scores at an item), I can
> > call
> >  refresh to refresh the model. And it is better on performance and the
> > model
> >  keeps up to date. Am I right?
> >
> > Thanks!
> >
> > — hyg
> >
> >
>
>

Reply via email to