Hi Sean, Thanks for the help, is currently reading <http://wiki.apache.org/hadoop/SequenceFile> for more information (please let me know if I am not reading the right document). So in short, by using the API, I can produce a SequenceFile by feeding the sql result containing image and tag data into it?
OME - Out of Memory Error lol (for more information on my attempt to cluster my test data, please refer to <http://mahout.markmail.org/search/?q=#query:+page:30+mid:nseo36uopmgat5iv+state:results>, let me know if the link is broken) Yea, I am making a recommender, but I can't implement the whole thing at once and I have no idea how to implement the other parts right now (yea, have the habit of breaking a project into small parts). My current task is to implement the tag clustering component as mentioned in the previous mail. @Jeffrey04 >________________________________ >From: Sean Owen <[email protected]> >To: [email protected]; Jeffrey <[email protected]> >Sent: Tuesday, August 9, 2011 2:54 PM >Subject: Re: Needs clue to create a Proof of Concept recommender > > >You don't need ARFF, no. You can write some Java code to write a SequenceFile >directly, one entry at a time. It would take a little study of the code to >understand how it works but it's probably just 10 lines. > > >What is the "OME" error? > > >Results can live wherever you want; HDFS is the most natural choice for a >SequenceFile. > > >You say you're making a recommender but sounds like your task now is >clustering? > > >On Tue, Aug 9, 2011 at 7:27 AM, Jeffrey <[email protected]> wrote: > >Hi, >> >>I am trying to implement a recommender system for my postgraduate project. I >>currently have all my data (collected using flickr API) stored in the MySQL >>database in RDF form using Redland <http://librdf.org> (lol, PHP is my main >>language hence Redland). >> >>The recommender system is basically designed similarly with the paper >>published by Jonathan Gemmell et. al (reference listed below), where tag >>clusters are also generated to find out the similarity measure between >>clusters and items/users (hence was really frustrating when I failed to dump >>the points for fuzzy k-means cluster). I am currently reading some articles >>on implementing taste (recommender framework) with mahout but the use cases >>described in the article are quite different than what I am about to >>implement. >> >>I am still trying to build the tag clusters properly now. Each tag is now >>represented as a vector of resources (each equivalent to a row in item-tag >>matrix), I am currently generate the vector by converting a pre-generated >>arff by following this tutorial >><https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka%27s+ARFF+Format>. >> Is there another way of doing this (is it possible to generate the vectors >>without first generate arff)? I have also read this >><https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text> >> but can't seem to relate it to my use case right now. >> >>Since I can't dump the points for the clusters using cluster dumper (keep >>getting OME) I would probably calculate the degree of membership manually. >>Where should I store the result (MySQL via JDBC? Hadoop Bigtable? Cassandra?) >>so that I can reuse it later for further calculation (eg. similarity of an >>item with a cluster)? >> >>Reference: >>Shepitsen, Andriy; Gemmell, Jonathan; Mobasher, Bamshad; Burke >>Robin. Personalized Recommendation in Folksonomies. Proceedings of the 2nd >>International Conference on Recommender Systems. Lausanne, Switzerland. >>October 23, 2008. >> >>p/s: I probably really should find a copy of "Mahout in Action" since I keep >>seeing it being recommended. >> >>best wishes, >>Jeffrey04 >> > > >
