Here is a crazy idea - how about taking the properties you care about and dropping them into a combined lucene index? Then all results for nodes with the same properties would be 'ambiguous'. Moving this forward to degrees of ambiguity might be possible by creating the combined 'value' using a reduced resolution of the properties (to increase similarity so the index will see them as identical).
Another option is the 'still very much in progress' composite index I started in December. Since all properties are mapped into normal integer space, the euclidean distance of the first level index nodes from each other is a discrete measure of similarity. A distance of zero means that the nodes attach to the same index node, and are very similar or identical. Higher values mean greater dissimilarity. This index theoretically supports any number of properties of any type (including strings) and allows you to plug in your own value->index mappers, which means you can control what you mean by 'similar'. On Tue, Feb 1, 2011 at 9:52 PM, Ben Sand <b...@bensand.com> wrote: > I was working on a project that used matching algorithms a while back. > > What you have is an n-dimensional matching problem. I can't remember > specifically what the last project were using, but this and the linked > algos > may be what you're looking for: > http://en.wikipedia.org/wiki/Mahalanobis_distance > > On 2 February 2011 07:34, Tim McNamara <paperl...@timmcnamara.co.nz> > wrote: > > > Say I have two nodes, > > > > > > { "type": "person", "name": "Neo" } > > { "type": "person", "name": "Neo" } > > > > > > > > Over time, I learn their locations. They both live in the same city. This > > increases the chances that they're the same person. However, over time it > > turns out that their ages differ, therefore it's far less likely that > they > > are the same Neo. > > > > > > Is there anything inside of Neo4j that attempts to determine how close > two > > nodes are? E.g. to what extent their subtrees and properties match? > > Additionally, can anyone suggest literature for algorithms for > > disambiguating the two entities? > > > > > > If I wanted to implement something that searches for similarities, that > > returns a probability of a match, can I do this within the database or > > should I implement it within the application? > > > > > > -- > > Tim McNamara > > @timClicks > > http://timmcnamara.co.nz > > > > > > > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user