Re: Mail thread detection [was Email and Collab. Filtering]

Ted Dunning Wed, 24 Aug 2011 14:58:00 -0700

The short conclusion is "people and language are involved, therefore it is a
bit of a mess".




On Wed, Aug 24, 2011 at 2:49 PM, Lukáš Vlček <[email protected]> wrote:

> Yes, it is not always reliable (especially if ppl reply to the email from
> desktop email clients and not from the web forum page). But there are more
> complex problems than this. The two most common problems are also thread
> hijacking and something what I call non-linear mail thread, that is a case
> when the email is resent also to a different mail list. For example the
> thread starts in Lucene but at some point in time someone adds Solr mail
> list to the To or Cc as well. From this point the thread has two parallel
> branches (and still this is the simple case).
>
> Experimenting with mail Subject text is another option but again one would
> not believe what kind of cases/or exceptions can be found until he tries
> it.
> I have seen mails with the same subject, in the same mail list, in about
> the
> same time window, involving the same author and the same reply-from person
> and they were not in the same thread.
>
> IMHO I do not think there is any perfect solution to this problem. Doing a
> lot of experiments is probably a good way how to catch the most common
> exceptions but in general it is very hard to avoid these problems. And once
> you (as a user of a search interface) experience these issues it can be
> quite challenging to build a trust that things like thread grouping or
> recommendation works well enough.
>
> On Wed, Aug 24, 2011 at 11:15 PM, Ted Dunning <[email protected]>
> wrote:
>
> > In the olden days, it was possible to thread together message id's in
> email
> > threads.
> >
> > In the modern world of many mailing list portals that don't really do
> email
> > in the official ways, this is more difficult than it should be.
> >
> > Have you tried and failed with message id's?
> >
> > On Wed, Aug 24, 2011 at 1:06 PM, Lukáš Vlček <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I would love to hear more about how exactly you detect (or define)
> > threads
> > > for emails (for example for Lucene or Solr public mail lists).
> > >
> > > As far as I can tell this is quite complex problem and based on my
> > > experience with many search web tools for mail lists this is still not
> > > solved. Speaking about thread based recommendations there can be missed
> > > important information if the thread is not detected correctly.
> > > If this has been already solved then please do not hesitate to point me
> > to
> > > any references.
> > >
> > > Reagards,
> > > Lukas
> > >
> > > On Mon, Aug 22, 2011 at 4:48 PM, Grant Ingersoll <[email protected]
> > > >wrote:
> > >
> > > > I'm working on an example (well, examples) of using Mahout with the
> ASF
> > > > Public Data Set up on Amazon (
> > > > http://aws.amazon.com/datasets/7791434387204566) and I wanted to
> show
> > > how
> > > > to use the 3 "C's" (collab filtering, clustering, classification)
> with
> > > the
> > > > data set.  Clustering and classification are pretty straight forward,
> > but
> > > > I'm wondering about the setup around collaborative filtering.
> > > >
> > > > The motivation for recommendations is pretty straightforward:
>  provide
> > > > people recs on emails that they might find useful based on what other
> > > people
> > > > have interacted with.  The tricky part is I am not totally sure on a
> > > valid
> > > > setup of the problem.  My current thinking is that I build up the
> rec.
> > > > matrix based on whether someone has interacted with
> (initiated/replied)
> > a
> > > > thread or not.  Thus, the columns are the thread ids and the rows are
> > the
> > > > users.  Each cell contains the count of the number of times user X
> has
> > > > interacted with thread Y.  This feels to me like it is a stand in for
> > > that
> > > > user's preference in that if they are replying multiple times, they
> > have
> > > an
> > > > interest in that topic.  I have no idea if this will be effective or
> > not,
> > > > but it seems like it could be interesting.  Does it sound reasonable?
> >  I
> > > > worry that even in a really large data set as above it will simply be
> > too
> > > > sparse.
> > > >
> > > > Is there a better way to think about this from a strict collaborative
> > > > filtering context?  In other words, I know I could do content-based
> > > > recommendations but that is not what I am after here.
> > > >
> > > > -Grant
> > > >
> > > > --------------------------------------------
> > > > Grant Ingersoll
> > > > http://www.lucidimagination.com
> > > >
> > > >
> > >
> >
>

Re: Mail thread detection [was Email and Collab. Filtering]

Reply via email to