You don't need ARFF, no. You can write some Java code to write a SequenceFile directly, one entry at a time. It would take a little study of the code to understand how it works but it's probably just 10 lines.
What is the "OME" error? Results can live wherever you want; HDFS is the most natural choice for a SequenceFile. You say you're making a recommender but sounds like your task now is clustering? On Tue, Aug 9, 2011 at 7:27 AM, Jeffrey <[email protected]> wrote: > Hi, > > I am trying to implement a recommender system for my postgraduate project. > I currently have all my data (collected using flickr API) stored in the > MySQL database in RDF form using Redland <http://librdf.org> (lol, PHP is > my main language hence Redland). > > The recommender system is basically designed similarly with the paper > published by Jonathan Gemmell et. al (reference listed below), where tag > clusters are also generated to find out the similarity measure between > clusters and items/users (hence was really frustrating when I failed to dump > the points for fuzzy k-means cluster). I am currently reading some articles > on implementing taste (recommender framework) with mahout but the use cases > described in the article are quite different than what I am about to > implement. > > I am still trying to build the tag clusters properly now. Each tag is now > represented as a vector of resources (each equivalent to a row in item-tag > matrix), I am currently generate the vector by converting a pre-generated > arff by following this tutorial < > https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka%27s+ARFF+Format>. > Is there another way of doing this (is it possible to generate the vectors > without first generate arff)? I have also read this < > https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text> > but can't seem to relate it to my use case right now. > > Since I can't dump the points for the clusters using cluster dumper (keep > getting OME) I would probably calculate the degree of membership manually. > Where should I store the result (MySQL via JDBC? Hadoop Bigtable? > Cassandra?) so that I can reuse it later for further calculation (eg. > similarity of an item with a cluster)? > > Reference: > Shepitsen, Andriy; Gemmell, Jonathan; Mobasher, Bamshad; Burke > Robin. Personalized Recommendation in Folksonomies. Proceedings of the 2nd > International Conference on Recommender Systems. Lausanne, Switzerland. > October 23, 2008. > > p/s: I probably really should find a copy of "Mahout in Action" since I > keep seeing it being recommended. > > best wishes, > Jeffrey04 >
