Here is a crazy idea - how about taking the properties you care about and
dropping them into a combined lucene index? Then all results for nodes with
the same properties would be 'ambiguous'. Moving this forward to degrees of
ambiguity might be possible by creating the combined 'value' using a reduced
resolution of the properties (to increase similarity so the index will see
them as identical).

Another option is the 'still very much in progress' composite index I
started in December. Since all properties are mapped into normal integer
space, the euclidean distance of the first level index nodes from each other
is a discrete measure of similarity. A distance of zero means that the nodes
attach to the same index node, and are very similar or identical. Higher
values mean greater dissimilarity. This index theoretically supports any
number of properties of any type (including strings) and allows you to plug
in your own value->index mappers, which means you can control what you mean
by 'similar'.

On Tue, Feb 1, 2011 at 9:52 PM, Ben Sand <b...@bensand.com> wrote:

> I was working on a project that used matching algorithms a while back.
>
> What you have is an n-dimensional matching problem. I can't remember
> specifically what the last project were using, but this and the linked
> algos
> may be what you're looking for:
> http://en.wikipedia.org/wiki/Mahalanobis_distance
>
> On 2 February 2011 07:34, Tim McNamara <paperl...@timmcnamara.co.nz>
> wrote:
>
> > Say I have two nodes,
> >
> >
> > { "type": "person", "name": "Neo" }
> > { "type": "person", "name": "Neo" }
> >
> >
> >
> > Over time, I learn their locations. They both live in the same city. This
> > increases the chances that they're the same person. However, over time it
> > turns out that their ages differ, therefore it's far less likely that
> they
> > are the same Neo.
> >
> >
> > Is there anything inside of Neo4j that attempts to determine how close
> two
> > nodes are? E.g. to what extent their subtrees and properties match?
> > Additionally, can anyone suggest literature for algorithms for
> > disambiguating the two entities?
> >
> >
> > If I wanted to implement something that searches for similarities, that
> > returns a probability of a match, can I do this within the database or
> > should I implement it within the application?
> >
> >
> > --
> > Tim McNamara
> > @timClicks
> > http://timmcnamara.co.nz
> >
> >
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to