Ted Dunning, You're maybe right recommendations isn't the best thing to use in this situation. However, it's partly requested of me to test Recommender Systems in such scenarios. But I will take your comments into account and see what I can do.
A last note on the dataset I'm using, at the moment it is small (30 000 user-item relations), but it could be extended. I'm aware that, given the exercises are cloze-questions, I could maybe take a content-based approach (instead of a CF-approach) in which I can analyse the context of the sentence around the word to be filled in. Either way, thanks for your input and I will ponder upon your comments. Kind regards, Floris Devriendt 2014-05-17 19:20 GMT+02:00 Ted Dunning <[email protected]>: > Floris, > > Given the size of the data you have and the goals that you have, I am not > convinced that recommendation is the right fit for your needs. > > I would recommend using multi-dimensional response analysis and then define > distance between users in terms of the latent variables you get from that. > You should be able to cluster directly in terms of those latent variables. > > Also, for cloze exercises, I think you may be missing some important > information by only counting correct/incorrect. The word that is filled > (if any) could be a huge hint if you know it. > > My feeling is that since you need lots of algorithmic flexibility and since > your dataset fits R, that you really will not be well served by Mahout. > The virtues of Mahout really only come out at very large scale and only > for particular problems. > > Also, you have at this point pretty much exhausted my knowledge of > item-response theory. > > > > > On Sat, May 17, 2014 at 6:04 AM, Floris Devriendt < > [email protected] > > wrote: > > > Hello Ted Dunning, > > > > First of all thank you for the response, I appreciate it. > > > > Am I right if I say you are suggesting a combination of recommendation > > systems and an an item-response analysis of the data? > > You're right when saying my data isn't huge, so R could work as a tool. > I'm > > just a little bit confused on the topic still. > > > > How exactly can I combine recommender systems with the item-response > > analysis? > > I'm just thinking out loud here, but do you mean I could determine the > > users ability level (using R) and then search for similar users in the > > user-user collaborative filtering technique? > > > > The data I have is very limited. I have users and their given solutions > to > > exercises and whether or not they were successful. The exercises > themselves > > are all language cloze exercises. The idea was to use a CF technique to > > determine similar users (determined by the similarity of users scores (1 > = > > correct; 0 = incorrect)) and then suggest exercises to users from which > we > > think the user will fail in the question (because similar users as him > have > > also failed there). > > > > Your idea about splitting up my matrix into two matrices is interesting, > > however I'm still thinking on what I can do with that. Is it true if I > say > > you're suggesting a more different approach, or is the item-response > > analysis something I can use within the recommender system? > > > > Kind regards, > > Floris Devriendt > > > > > > 2014-05-17 1:33 GMT+02:00 Ted Dunning <[email protected]>: > > > > > The easiest way to shoehorn this data into the binary framework for > > > recommenders is to keep two matrices, one for success, one for failure. > > > > > > There is lots to do from there. > > > > > > Most analyses of this kind of data (so-called item-response data [1]), > > > however, requires some kind of hidden variable analysis beyond that > > > available in Mahout. The good news is that the data available in these > > > kinds of problems is almost always relatively small (millions or tens > of > > > millions of observations is pretty rare). This means that conventional > > > tools like R are pretty easy to use [2,3,4]. > > > > > > You could try using some of the matrix decomposition algorithms in > Mahout > > > on these data, but I really think that a more nuanced analysis would be > > > better. > > > > > > [1] > > > > > > > > > https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model > > > > > > [2] http://cran.r-project.org/web/views/Psychometrics.html > > > > > > [3] http://cran.r-project.org/web/packages/mirt/mirt.pdf > > > > > > [4] http://cran.r-project.org/web/packages/ltm/ltm.pdf > > > > > > > > > On Thu, May 15, 2014 at 8:53 AM, Floris Devriendt < > > > [email protected] > > > > wrote: > > > > > > > Hello everybody, > > > > > > > > I'm a new Mahout user and I was hoping to some people could point me > in > > > the > > > > right direction. > > > > > > > > My data consists of exercise results made by different users and I > want > > > to > > > > recommend different exercises to different users using the > > collaborative > > > > filtering techniques available in Mahout. The idea is that the > 'items' > > in > > > > my data consists of the exercises and the relations between users and > > > items > > > > can take up three values: > > > > > > > > - A user has correctly completed the exercise. > > > > - A user has incorrectly completed the exercise. > > > > - A user has not made an attempt at the exercise. > > > > > > > > In essence this data can be compared to like/dislike/unknown type of > > > data. > > > > > > > > Now I know more or less how to build a recommender in Mahout but I'm > > > having > > > > some difficulties in designing it. A lot depends on the similarity > > > measure > > > > used, but most similarity measures take into account a rating style > of > > > > preferences (e.g. when rating movies or music). The exceptions, if I > > > > interpret it correctly, are the Tanimoto Coefficient and the log > > > likelihood > > > > Similarity. But those similarities seem to focus on boolean data > where > > a > > > > user either has a relation with an item or there doesn't exist one. > > > > > > > > What are the key aspects to keep into account when working with this > > kind > > > > of data (with three distinct values)? Does it all depend on my > > similarity > > > > measure used? Or are there other aspects I need to take into account > to > > > > make the recommendations worthwhile for this kind of data? > > > > > > > > I also have some more questions on some of the similarity measures > > > > implemented in Mahout, but I don't want to ask too much at once. If > > > > somebody can guide me in the right direction with the above > questions, > > > then > > > > this would be appreciated. > > > > > > > > Kind regards, > > > > Floris Devriendt > > > > > > > > > >
