AiIIIII. That's why Zimba keeps killing me. When you download from
IMAP you get IDs but I now see that the mail behind it may not have
correct ones.

Subject lines plus text/text block comparisons should work pretty
well. There will be misfires from copy&pastes of parts, different
languages in the 'el cabron escribe 'date string".

On Wed, Aug 24, 2011 at 2:34 PM, Jeff Eastman <[email protected]> wrote:
> IIRC (circa 2008) Outlook had its own proprietary notion of message ids and 
> did not follow the SMTP standard. That made threading with ids pretty much a 
> non-starter in many enterprises. We had some success using Subject 
> comparisons, as most email clients add Re: prefixes to the subject line in a 
> reply message. Threading using a combination of these two approaches proved 
> to be reasonably good, though full of exceptions. Good luck.
>
> -----Original Message-----
> From: Ted Dunning [mailto:[email protected]]
> Sent: Wednesday, August 24, 2011 2:15 PM
> To: [email protected]
> Subject: Re: Mail thread detection [was Email and Collab. Filtering]
>
> In the olden days, it was possible to thread together message id's in email
> threads.
>
> In the modern world of many mailing list portals that don't really do email
> in the official ways, this is more difficult than it should be.
>
> Have you tried and failed with message id's?
>
> On Wed, Aug 24, 2011 at 1:06 PM, Lukáš Vlček <[email protected]> wrote:
>
>> Hi,
>>
>> I would love to hear more about how exactly you detect (or define) threads
>> for emails (for example for Lucene or Solr public mail lists).
>>
>> As far as I can tell this is quite complex problem and based on my
>> experience with many search web tools for mail lists this is still not
>> solved. Speaking about thread based recommendations there can be missed
>> important information if the thread is not detected correctly.
>> If this has been already solved then please do not hesitate to point me to
>> any references.
>>
>> Reagards,
>> Lukas
>>
>> On Mon, Aug 22, 2011 at 4:48 PM, Grant Ingersoll <[email protected]
>> >wrote:
>>
>> > I'm working on an example (well, examples) of using Mahout with the ASF
>> > Public Data Set up on Amazon (
>> > http://aws.amazon.com/datasets/7791434387204566) and I wanted to show
>> how
>> > to use the 3 "C's" (collab filtering, clustering, classification) with
>> the
>> > data set.  Clustering and classification are pretty straight forward, but
>> > I'm wondering about the setup around collaborative filtering.
>> >
>> > The motivation for recommendations is pretty straightforward:  provide
>> > people recs on emails that they might find useful based on what other
>> people
>> > have interacted with.  The tricky part is I am not totally sure on a
>> valid
>> > setup of the problem.  My current thinking is that I build up the rec.
>> > matrix based on whether someone has interacted with (initiated/replied) a
>> > thread or not.  Thus, the columns are the thread ids and the rows are the
>> > users.  Each cell contains the count of the number of times user X has
>> > interacted with thread Y.  This feels to me like it is a stand in for
>> that
>> > user's preference in that if they are replying multiple times, they have
>> an
>> > interest in that topic.  I have no idea if this will be effective or not,
>> > but it seems like it could be interesting.  Does it sound reasonable?  I
>> > worry that even in a really large data set as above it will simply be too
>> > sparse.
>> >
>> > Is there a better way to think about this from a strict collaborative
>> > filtering context?  In other words, I know I could do content-based
>> > recommendations but that is not what I am after here.
>> >
>> > -Grant
>> >
>> > --------------------------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com
>> >
>> >
>>
>



-- 
Lance Norskog
[email protected]

Reply via email to