Since your values are discrete, Davids solution would be a good one. Connecting each node to a node representing the value of the property. So if each node has 5 properties (in your example below), there would be five relationships to value nodes, each of which would in turn be related to their property node.
student --IN_YEAR--> '2nd year' <-- VALUE -- years This is like having a bunch of category indexes that you can use to find all students with a particular property/value pair. But for your combined property/value search, I think you would still end up traversing through quite a lot of the graph to get your answer. So there is an elaboration on this which takes it one step further. Create nodes that represent the combinations of values. If you have 10 properties, each with on average 5 possible values, you have a maximum of 5^10 theoretically possible values. The real maximum number of combinations will also never be more than the number of students. If it does reach the number of students, there are no students in common, so that is unlikely. If each student is connected instead to one of these nodes, then finding students in common is a trivial depth 2 traversal. However, the complexity is now moved to the step for creating the graph. As you load students into the graph, you need a fast way to see if that students combination of properties and values already exists, and if so, link it, and if not, create it and link it. There are two solutions to this: - make a hash of all the values and index that in lucene, so you can find it later - use the amanz-index <https://github.com/craigtaverner/amanzi-index>(still in progress). This is a pure tree index that builds basically the same structure I described above. The lucene approach is better known, since Neo4j has had lucene for years. My index is still a prototype. On Thu, Feb 24, 2011 at 8:54 AM, <[email protected]> wrote: > Hi David > > I was thinking on these lines my self, but was unable to formulate it. I > think Ill elaborate on the actual problem as you've suggested. > > There are a number of college students who I have gathered various > information about, example: > > 1. What their major is (4 options) > 2. What year they are in (4 options) > 3. Favourite genre of music and movies (4 options each) > 4. A few yes/no questions > 5. I have a list of who's friends with who in this sample > > Now I want to see the people belonging to Person A's most populated common > property set. > Assuming that number is 5properties out of 10, I next want to see for 4 > properties (which may be different, but obviously for the same5-1 as well). > > I hope this makes it clearer. > > Thanks! > Sent on my BlackBerry® from Vodafone > > -----Original Message----- > From: David Montag <[email protected]> > Sender: [email protected] > Date: Wed, 23 Feb 2011 23:30:11 > To: Neo4j user discussions<[email protected]> > Reply-To: Neo4j user discussions <[email protected]> > Subject: Re: [Neo4j] How to query based on properties > > Agam, > > Depending on the set of possible values, you could represent the properties > with relationships instead. A unique property value can then be represented > by a node, which would be linked to all nodes that have that value. The > relationship type could indicate the property. The "value" nodes would then > be indexed so that you can find the right node when setting the "property" > (i.e. creating a relationship to the value node). > > Also, it would be great if you could elaborate a bit more on the actual use > case behind this algorithm. That way, a more suitable solution might > emerge, > solving your problem in a different way. > > Thanks, > David > > On Wed, Feb 23, 2011 at 10:36 PM, Agam Dua <[email protected]> wrote: > > > Hey > > > > I'm a graph database and Neo4j newbie and I'm in a bit of a fix: > > > > *Problem Description* > > Let's say I have 'n' nodes in the graph, representing the same type of > > object. They have certain undirected links between them. > > Now each of these 'n' nodes has the same 10 properties, the *values* of > > which may differ. > > > > *Problem Statement* > > Take starting node A. I need to find a way to traverse all the nodes of > the > > graph and print out which nodes have the most properties in common with > A. > > For example, if A, C, D, E, F, G have 'x' properties in common I want to > > print the nodes. > > Then, I want to print the nodes which have 'x-1' properties with the same > > value. Then 'x-2', and so on. > > > > *Question* > > Now my question is, is this possible? If so, what would be the best way > to > > go about it? > > > > Thanks in advance! > > Agam. > > * > > * > > _______________________________________________ > > Neo4j mailing list > > [email protected] > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > David Montag > Neo Technology, www.neotechnology.com > Cell: 650.556.4411 > [email protected] > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

