It's been 13 years since I left Chemistry, but I think I have some residual interest in the subject :-)
My two cents worth for this problem is that it is possible to model everything in one single graph: - Store both the chemical structures and the relationships as graphs, differentiating using relationship types. - Atoms connected together would have BOND relationships, but the molecule itself can also be represented by a single node, with an ATOMS relationship into the sub-graph representing the molecular structure - Meta-data structures related to the molecules would be a graph connecting them all together and connected to the root node. So, for example, the suggestion of a molecule having been made by John Doe and stored in Room 123 would be modeled by having a 'rooms' node connected to a node for each room, and the room node with 'name'='123' would be connected using STORES relationship to the molecule node above. Similarly we would have a 'chemists' node connected to each chemist, and the chemist with 'name'='John Doe' would be related by CREATED to the molecule. The CREATED relationship could have properties like created_on, etc. John Doe could also have properties like email, phone number, etc. In this approach we have no need for the external index, since all queries suggested can be achieved using a traversal. If you want composite queries and know in advance the main queries you will make, you can also optimise the graph structure for those queries. For example, if a standard query is to ask the database which molecules were created by chemist X between March and August 2010, then create month nodes between the chemist node and the molecule node, so all molecules made by X in January would be connected first to X's January node and from there to X himself. This is in effect building a custom index into the graph. It is a good solution if you know very well what kind of queries you will make. However, as Peter suggests, using the lucene index, especially with the new composite query support, you do not need to think too hard about having your own index graph, but would simply add both the chemists name and the date to a single index on the molecule node itself. So the lucene query for the chemist and date should return a set of molecule nodes, and you can then do further pattern matching, if needed, on those. One other idea I would consider for pattern matching is to generate a signature, a kind of hash of the molecule shape that is representative of the shape. Then you can index that hash also, and effectively get the molecular shape to be a lucene searchable field. This is only possible if you know your domain well enough to create a hash that makes sense for your situation. In the case of chemistry, it really depends on what you mean by 'shape' when doing the search. For example, perhaps a search on chemical formula is a good enough description of the shape, and in that case your 'hash' is simply the formula. So, for example, ethanol would be C2H5OH. Searches on that hash should yield ethanol and perhaps a few similar compounds. If we spent a little more time thinking about this, we could possibly come up with a few better hashes, more likely to match 'shape', but I hope you at least got my idea :-) (I suspect that there are probably standard ways of writing down a chemical shape uniquely, and if you get the shape hash to be truely unique, you can also not bother to store the molecule as a sub-graph at all, saving space and complexity). Regards, Craig On Tue, Nov 9, 2010 at 2:32 PM, Peter Neubauer < peter.neuba...@neotechnology.com> wrote: > Thomas, > IMHO, the examination of the graphs should be much helped by the new > Index API, where you can ask and store composite indexes. I would > imagine that you could do a lot of the exclusion work by indexing the > chemical structures by not only one node, but possibly construct a > typical path of nodes and relationships and index that one with > http://wiki.neo4j.org/content/Index_Framework#Compound_queries, that > that you can ask complex queries involving the whole structure, and > get the "entry node" for the subgraph back. Also, that entry node > could be used to connect to e.g. John Doe in order to represent the > whole compound. > > Would that be feasible? > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > On Tue, Nov 9, 2010 at 10:47 AM, Thomas Strunz <beginn...@hotmail.de> > wrote: > > > > Hi all, > > > > I have following questions: > > > > is neo4j also suited for a database, that contains many 100k of small > graphs (5-30 nodes, mostly around 1-4 relationships per node)? (As far as I > understood not the main purpose of the product but doesn't hurt to ask) > > > > If yes how can you perform subgraph matching and whats it's performance? > (especially considering that most nodes are the same and the relationship > types between them too) > > To be specific: graph = chemical Structure (mainly C and H Atoms (nodes) > connected by bonds (single, double,..) > > > > A query typically only contains nodes and relationships that appear in > 100% of the "small graphs" and multiple times per graph. > > I read > > > > http://lists.neo4j.org/pipermail/user/2009-June/001331.html > > > > and this seems to hint it will be rather tricky to achieve this? (defines > the entry point, and only enter each "small graph" once) > > > > Note that prior filtering steps unrelated to graphs must be done > previously anyway and hence the number of "small graphs" to traverse is > usually much lower than the total number. > > > > > > And an additional question: > > > > Can a node be a traversable graph too? > > Example: chemical Structure XYZ (a graph) was made by John Doe and is > stored in Room 123. > > (the chemical Structure XYZ must be seen as a single object (=Node) for > the additional context). > > Query would be: find all chemical Structures made by John Doe that match > a given chemical Structure > > > > I hope it's understandable what i'm tryign to get at. > > > > Best Regards, > > > > Thomas > > > > > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user