[Neo4j] Creating and managing external index

Avi Shai Sun, 20 Nov 2011 17:40:33 -0800

What is the best way to create an external index but only for certain nodes?
Really I want something like the in-graph data structures, but instead it
will be stored in another database(s). I am in essence indexing only a
sub-graph or a straight list of nodes. I then want to use these indexes as
entry points in some cases rather than traversing.

I understand that there is already Lucene, but I have data that is better
suited to other indexes. I still want to use Lucene for full-text, just not
for anything else. I am currently taking a stab at implementing the
blueprint index interfaces (manual, automatic), but for another purpose. If
I am always updating these indexes, but only for certain vertex types, what
is the best integration point? In my data service classes/lower level neo4j
stuff, or in a server event handler to plug-in the transaction? What about
for all vertices? I guess I understand how to write the index classes but
not about the best way of consuming them, and not if they apply well for
lots of partial, smaller indexes.

For instance, I want to store data as temporal values, with the most recent
data first for a group of nodes. I'm not doing "Twitter" or a blog, but
either is a good enough analogy. If I post something with a given tag, I
want to index all the nodes that have been tagged by that tag (tag edge) in
temporal order for example to create a "recently tagged" feed or a "recently
seen users" feed that contains the users that have recently tagged using
that tag. I could store this data in Redis exactly how I want and have a
hot set in memory that can then be used either directly in some pages in my
app, or as an entry point into neo4j for more complex queries. These indexes
probably require lots of writes and I wanted to also avoid locking related
nodes on any updates.

Currently part of the reason I'm doing this is I have lots of super nodes in
my design. I've patched this some by keeping counts in node properties and
adding proxy nodes as mini-partions to reduce the number of relationships.
I've also looked at things like combining common nodes together as
junctions, but there are too many permutations to scale probably. Anyway, if
I use in-graph indexes, I have to update my indexes every insertion or
update. I'm going to try out indexed relationships, and I think it will
help, but with respect, I don't think it will scale well or fit my use
cases, especially for indexes where data drops out because the size is fixed
(like a fixed list).

I feel that creating index structures in the graph is nice, but it will
severely balloon the graph. Moreover, I want to save resources on the
servers running neo for graph traversals and other graph activities and I
would rather use other clustered servers to store huge amounts of index data
in memory. One other idea is to use another neo4j instance as an index to
itself, but I think the characteristics of what I am doing are better suited
in some cases for Redis (temporal lists) or Mongo (hierarchical metrics)
depending the use-case. Example: pulling down linear lists of time-data by
page and sorting front to back or back to front.

I know that's a lot, but I wanted to at least give some detail beyond what
I've already read here in all the old posts I've dug through this week. Any
feedback? Thanks.

--
View this message in context:
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-and-managing-external-index-tp3523613p3523613.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Creating and managing external index

Reply via email to