Re: Email and Collab. Filtering

Lance Norskog Wed, 24 Aug 2011 03:13:19 -0700

With collaborative filtering: the input database is so choppy and
heterogeneous that it will be hard to get useful recs using naive
techniques. It would be a good exercise in 'know your data': instead
of finding preferences you hunt for features: does this user send lots
of mails in the evening? Do these three go round and round a lot?


You would also need good analytics software to display the results.
That's another class!

Perhaps you do some text-ish thing with text? Creating abstracts is
something that turns up a lot. You make a vector for every sentence in
a document, then do SVD on the vectors. The feature vectors are bags
of words, not correlated to sentences. You pick the first feature
vector, and find the closest sentence. That sentence is the primary
statement of the topic. Next, find a second sentence that matches the
second feature vector: this will be orthogonal to the main sentence,
and will modify it in some way. I would expect a misfire rate for
this, and would try sentence parts instead. Perhaps the vector could
be an n-gram of successive sentence clauses.

This is real machine learning that works on a very small dataset. It
could be a fun class example. In this context it could be used to
identify mail with coherent thought v.s. back&forth conversations.

On Mon, Aug 22, 2011 at 9:14 AM, Sean Owen <[email protected]> wrote:
> Here are two ideas:
>
> Recommend users to users.
> Your users and items are both e-mail senders. The strength of the
> association could be the number of e-mails from A to B (or perhaps the
> logarithm). This would find people that people like you e-mail a lot.
> Sounds interesting, if not immediately useful, because people e-mail
> others for very different reasons.
>
> Recommend threads to users.
> Users are people, items are threads. This might suggest discussions
> you should be a party to, or may be of interest since it concerns
> people you often share a thread with. I think it has slightly more
> potential to be useful, but, probably a non-starter in practice as
> it's not generally true that you'er welcome to see a thread you
> weren't copied on.
>
> Recommend users to threads.
> Kind of the "have you forgotten to include X" function from Gmail.
> Users are threads and items are people.
>
>
> All of these are sort of novelties -- I don't think CF applies so well
> -- but surely worth trying to see what you get out.
>
> In the book I tried recommending Wikipedia articles to Wikipedia
> articles -- discovering missing hyperlinks so to speak -- and while it
> was a bit novelty the results were intriguing and entertaining.
>
>
> On Mon, Aug 22, 2011 at 3:48 PM, Grant Ingersoll <[email protected]> wrote:
>> I'm working on an example (well, examples) of using Mahout with the ASF 
>> Public Data Set up on Amazon 
>> (http://aws.amazon.com/datasets/7791434387204566) and I wanted to show how 
>> to use the 3 "C's" (collab filtering, clustering, classification) with the 
>> data set.  Clustering and classification are pretty straight forward, but 
>> I'm wondering about the setup around collaborative filtering.
>>
>> The motivation for recommendations is pretty straightforward:  provide 
>> people recs on emails that they might find useful based on what other people 
>> have interacted with.  The tricky part is I am not totally sure on a valid 
>> setup of the problem.  My current thinking is that I build up the rec. 
>> matrix based on whether someone has interacted with (initiated/replied) a 
>> thread or not.  Thus, the columns are the thread ids and the rows are the 
>> users.  Each cell contains the count of the number of times user X has 
>> interacted with thread Y.  This feels to me like it is a stand in for that 
>> user's preference in that if they are replying multiple times, they have an 
>> interest in that topic.  I have no idea if this will be effective or not, 
>> but it seems like it could be interesting.  Does it sound reasonable?  I 
>> worry that even in a really large data set as above it will simply be too 
>> sparse.
>>
>> Is there a better way to think about this from a strict collaborative 
>> filtering context?  In other words, I know I could do content-based 
>> recommendations but that is not what I am after here.
>>
>> -Grant
>>
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>>
>



-- 
Lance Norskog
[email protected]

Re: Email and Collab. Filtering

Reply via email to