Re: "Binary" Data

Floris Devriendt Sat, 17 May 2014 12:08:23 -0700

Ted Dunning,

You're maybe right recommendations isn't the best thing to use in this
situation. However, it's partly requested of me to test Recommender Systems
in such scenarios. But I will take your comments into account and see what
I can do.


A last note on the dataset I'm using, at the moment it is small (30 000
user-item relations), but it could be extended. I'm aware that, given the
exercises are cloze-questions, I could maybe take a content-based approach
(instead of a CF-approach) in which I can analyse the context of the
sentence around the word to be filled in.

Either way, thanks for your input and I will ponder upon your comments.

Kind regards,
Floris Devriendt



2014-05-17 19:20 GMT+02:00 Ted Dunning <[email protected]>:

> Floris,
>
> Given the size of the data you have and the goals that you have, I am not
> convinced that recommendation is the right fit for your needs.
>
> I would recommend using multi-dimensional response analysis and then define
> distance between users in terms of the latent variables you get from that.
>  You should be able to cluster directly in terms of those latent variables.
>
> Also, for cloze exercises, I think you may be missing some important
> information by only counting correct/incorrect.  The word that is filled
> (if any) could be a huge hint if you know it.
>
> My feeling is that since you need lots of algorithmic flexibility and since
> your dataset fits R, that you really will not be well served by Mahout.
>  The virtues of Mahout really only come out at very large scale and only
> for particular problems.
>
> Also, you have at this point pretty much exhausted my knowledge of
> item-response theory.
>
>
>
>
> On Sat, May 17, 2014 at 6:04 AM, Floris Devriendt <
> [email protected]
> > wrote:
>
> > Hello Ted Dunning,
> >
> > First of all thank you for the response, I appreciate it.
> >
> > Am I right if I say you are suggesting a combination of recommendation
> > systems and an an item-response analysis of the data?
> > You're right when saying my data isn't huge, so R could work as a tool.
> I'm
> > just a little bit confused on the topic still.
> >
> > How exactly can I combine recommender systems with the item-response
> > analysis?
> > I'm just thinking out loud here, but do you mean I could determine the
> > users ability level (using R) and then search for similar users in the
> > user-user collaborative filtering technique?
> >
> > The data I have is very limited. I have users and their given solutions
> to
> > exercises and whether or not they were successful. The exercises
> themselves
> > are all language cloze exercises. The idea was to use a CF technique to
> > determine similar users (determined by the similarity of users scores (1
> =
> > correct; 0 = incorrect)) and then suggest exercises to users from which
> we
> > think the user will fail in the question (because similar users as him
> have
> > also failed there).
> >
> > Your idea about splitting up my matrix into two matrices is interesting,
> > however I'm still thinking on what I can do with that. Is it true if I
> say
> > you're suggesting a more different approach, or is the item-response
> > analysis something I can use within the recommender system?
> >
> > Kind regards,
> > Floris Devriendt
> >
> >
> > 2014-05-17 1:33 GMT+02:00 Ted Dunning <[email protected]>:
> >
> > > The easiest way to shoehorn this data into the binary framework for
> > > recommenders is to keep two matrices, one for success, one for failure.
> > >
> > > There is lots to do from there.
> > >
> > > Most analyses of this kind of data (so-called item-response data [1]),
> > > however, requires some kind of hidden variable analysis beyond that
> > > available in Mahout.  The good news is that the data available in these
> > > kinds of problems is almost always relatively small (millions or tens
> of
> > > millions of observations is pretty rare).  This means that conventional
> > > tools like R are pretty easy to use [2,3,4].
> > >
> > > You could try using some of the matrix decomposition algorithms in
> Mahout
> > > on these data, but I really think that a more nuanced analysis would be
> > > better.
> > >
> > > [1]
> > >
> > >
> >
> https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model
> > >
> > > [2] http://cran.r-project.org/web/views/Psychometrics.html
> > >
> > > [3] http://cran.r-project.org/web/packages/mirt/mirt.pdf
> > >
> > > [4] http://cran.r-project.org/web/packages/ltm/ltm.pdf
> > >
> > >
> > > On Thu, May 15, 2014 at 8:53 AM, Floris Devriendt <
> > > [email protected]
> > > > wrote:
> > >
> > > > Hello everybody,
> > > >
> > > > I'm a new Mahout user and I was hoping to some people could point me
> in
> > > the
> > > > right direction.
> > > >
> > > > My data consists of exercise results made by different users and I
> want
> > > to
> > > > recommend different exercises to different users using the
> > collaborative
> > > > filtering techniques available in Mahout. The idea is that the
> 'items'
> > in
> > > > my data consists of the exercises and the relations between users and
> > > items
> > > > can take up three values:
> > > >
> > > >    - A user has correctly completed the exercise.
> > > >    - A user has incorrectly completed the exercise.
> > > >    - A user has not made an attempt at the exercise.
> > > >
> > > > In essence this data can be compared to like/dislike/unknown type of
> > > data.
> > > >
> > > > Now I know more or less how to build a recommender in Mahout but I'm
> > > having
> > > > some difficulties in designing it. A lot depends on the similarity
> > > measure
> > > > used, but most similarity measures take into account a rating style
> of
> > > > preferences (e.g. when rating movies or music). The exceptions, if I
> > > > interpret it correctly, are the Tanimoto Coefficient and the log
> > > likelihood
> > > > Similarity. But those similarities seem to focus on boolean data
> where
> > a
> > > > user either has a relation with an item or there doesn't exist one.
> > > >
> > > > What are the key aspects to keep into account when working with this
> > kind
> > > > of data (with three distinct values)? Does it all depend on my
> > similarity
> > > > measure used? Or are there other aspects I need to take into account
> to
> > > > make the recommendations worthwhile for this kind of data?
> > > >
> > > > I also have some more questions on some of the similarity measures
> > > > implemented in Mahout, but I don't want to ask too much at once. If
> > > > somebody can guide me in the right direction with the above
> questions,
> > > then
> > > > this would be appreciated.
> > > >
> > > > Kind regards,
> > > > Floris Devriendt
> > > >
> > >
> >
>

Re: "Binary" Data

Reply via email to