Hi Sean,

Thanks for the help, is currently reading 
<http://wiki.apache.org/hadoop/SequenceFile> for more information (please let 
me know if I am not reading the right document). So in short, by using the API, 
I can produce a SequenceFile by feeding the sql result containing image and tag 
data into it?

OME - Out of Memory Error lol (for more information on my attempt to cluster my 
test data, please refer to 
<http://mahout.markmail.org/search/?q=#query:+page:30+mid:nseo36uopmgat5iv+state:results>,
 let me know if the link is broken)

Yea, I am making a recommender, but I can't implement the whole thing at once 
and I have no idea how to implement the other parts right now (yea, have the 
habit of breaking a project into small parts). My current task is to implement 
the tag clustering component as mentioned in the previous mail.

@Jeffrey04



>________________________________
>From: Sean Owen <[email protected]>
>To: [email protected]; Jeffrey <[email protected]>
>Sent: Tuesday, August 9, 2011 2:54 PM
>Subject: Re: Needs clue to create a Proof of Concept recommender
>
>
>You don't need ARFF, no. You can write some Java code to write a SequenceFile 
>directly, one entry at a time. It would take a little study of the code to 
>understand how it works but it's probably just 10 lines.
>
>
>What is the "OME" error?
>
>
>Results can live wherever you want; HDFS is the most natural choice for a 
>SequenceFile.
>
>
>You say you're making a recommender but sounds like your task now is 
>clustering?
>
>
>On Tue, Aug 9, 2011 at 7:27 AM, Jeffrey <[email protected]> wrote:
>
>Hi,
>>
>>I am trying to implement a recommender system for my postgraduate project. I 
>>currently have all my data (collected using flickr API) stored in the MySQL 
>>database in RDF form using Redland <http://librdf.org> (lol, PHP is my main 
>>language hence Redland).
>>
>>The recommender system is basically designed similarly with the paper 
>>published by Jonathan Gemmell et. al (reference listed below), where tag 
>>clusters are also generated to find out the similarity measure between 
>>clusters and items/users (hence was really frustrating when I failed to dump 
>>the points for fuzzy k-means cluster). I am currently reading some articles 
>>on implementing taste (recommender framework) with mahout but the use cases 
>>described in the article are quite different than what I am about to 
>>implement.
>>
>>I am still trying to build the tag clusters properly now. Each tag is now 
>>represented as a vector of resources (each equivalent to a row in item-tag 
>>matrix), I am currently generate the vector by converting a pre-generated 
>>arff by following this tutorial 
>><https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka%27s+ARFF+Format>.
>> Is there another way of doing this (is it possible to generate the vectors 
>>without first generate arff)? I have also read this 
>><https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text>
>> but can't seem to relate it to my use case right now.
>>
>>Since I can't dump the points for the clusters using cluster dumper (keep 
>>getting OME) I would probably calculate the degree of membership manually. 
>>Where should I store the result (MySQL via JDBC? Hadoop Bigtable? Cassandra?) 
>>so that I can reuse it later for further calculation (eg. similarity of an 
>>item with a cluster)?
>>
>>Reference:
>>Shepitsen, Andriy; Gemmell, Jonathan; Mobasher, Bamshad; Burke 
>>Robin. Personalized Recommendation in Folksonomies. Proceedings of the 2nd 
>>International Conference on Recommender Systems. Lausanne, Switzerland. 
>>October 23, 2008. 
>>
>>p/s: I probably really should find a copy of "Mahout in Action" since I keep 
>>seeing it being recommended.
>>
>>best wishes,
>>Jeffrey04
>>
>
>
>

Reply via email to