What is the best way to create an external index but only for certain nodes? Really I want something like the in-graph data structures, but instead it will be stored in another database(s). I am in essence indexing only a sub-graph or a straight list of nodes. I then want to use these indexes as entry points in some cases rather than traversing.
I understand that there is already Lucene, but I have data that is better suited to other indexes. I still want to use Lucene for full-text, just not for anything else. I am currently taking a stab at implementing the blueprint index interfaces (manual, automatic), but for another purpose. If I am always updating these indexes, but only for certain vertex types, what is the best integration point? In my data service classes/lower level neo4j stuff, or in a server event handler to plug-in the transaction? What about for all vertices? I guess I understand how to write the index classes but not about the best way of consuming them, and not if they apply well for lots of partial, smaller indexes. For instance, I want to store data as temporal values, with the most recent data first for a group of nodes. I'm not doing "Twitter" or a blog, but either is a good enough analogy. If I post something with a given tag, I want to index all the nodes that have been tagged by that tag (tag edge) in temporal order for example to create a "recently tagged" feed or a "recently seen users" feed that contains the users that have recently tagged using that tag. I could store this data in Redis exactly how I want and have a hot set in memory that can then be used either directly in some pages in my app, or as an entry point into neo4j for more complex queries. These indexes probably require lots of writes and I wanted to also avoid locking related nodes on any updates. Currently part of the reason I'm doing this is I have lots of super nodes in my design. I've patched this some by keeping counts in node properties and adding proxy nodes as mini-partions to reduce the number of relationships. I've also looked at things like combining common nodes together as junctions, but there are too many permutations to scale probably. Anyway, if I use in-graph indexes, I have to update my indexes every insertion or update. I'm going to try out indexed relationships, and I think it will help, but with respect, I don't think it will scale well or fit my use cases, especially for indexes where data drops out because the size is fixed (like a fixed list). I feel that creating index structures in the graph is nice, but it will severely balloon the graph. Moreover, I want to save resources on the servers running neo for graph traversals and other graph activities and I would rather use other clustered servers to store huge amounts of index data in memory. One other idea is to use another neo4j instance as an index to itself, but I think the characteristics of what I am doing are better suited in some cases for Redis (temporal lists) or Mongo (hierarchical metrics) depending the use-case. Example: pulling down linear lists of time-data by page and sorting front to back or back to front. I know that's a lot, but I wanted to at least give some detail beyond what I've already read here in all the old posts I've dug through this week. Any feedback? Thanks. -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-and-managing-external-index-tp3523613p3523613.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user