Hi Guys -

This is a question from the Gremlin Users group
(https://groups.google.com/d/topic/gremlin-users/cbqydbD3DgQ/discussion)...


James Thornton wrote:
> 
> 
> What are the recommended methods for restructuring graphs in production?
> 
> For example, say you store raw input from users as nodes, and you run
> algorithms on the raw input to slice and dice the data, make inferences
> and associations based on your initial algorithms. 
> 
> Over time you improve your algorithms and find better ways of structuring
> the relationships. What are the best ways to restructure the graph on a
> live system? 
> 
> One approach would be to keep an event log for each user action and replay
> it to rebuild the graph (http://martinfowler.com/bliki/MemoryImage.html).
> You could keep two DBs and switch over to the new version.
> 
> Or is it better to just prune the induced graph?
> 

We discussed several approaches for this, and one approach specific to
Neo4j-HA would be to run multiple graph versions by configuring "read-only"
mirrors that have different algorithms/stored procedures -- a base graph
that users write to that only stores raw/manual inputs, and one or more
algorithmically-enhanced graphs that users read from.

In the context of a Web application, this would allow you to test different
versions of your algorithms on a subset of users and switch over to new
versions as your algorithms improve.

For this to work, you would at least need:

* The ability to execute different triggers/stored procedures on the
different "read-only" mirrors.

* A way to push data to and augment the different derived graphs with data
from external sources, such as Hadoop processes.

* Either two sequence sources -- one for the manual elements and one for the
algorithmically-enhanced elements -- or the ability for each graph to use
the same sequence generator.

NOTE: The "read only" mirrors are not really read-only because they can
receive automated/algorithmic inputs, just not user/manual inputs.

I added this to the Github issue tracker
(https://github.com/neo4j/community/issues/11), and Tobias recommended
posting it to the list for discussion. 

See also...

"[Furnace] A Graph Algorithm-Based TinkerPop Project? -- or simply,
DerivedGraph?"
https://groups.google.com/d/topic/gremlin-users/G4NC0J-FAtQ/discussion

Here is a Google research video on infrastructure and testing new
algorithms...

"Large Scale Search System Infrastructure and Search Quality"
http://research.google.com/roundtable/LSS.html

- James

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Restructuring-a-Neo4j-Production-Graph-tp3316771p3316771.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to