Hi Guys - This is a question from the Gremlin Users group (https://groups.google.com/d/topic/gremlin-users/cbqydbD3DgQ/discussion)...
James Thornton wrote: > > > What are the recommended methods for restructuring graphs in production? > > For example, say you store raw input from users as nodes, and you run > algorithms on the raw input to slice and dice the data, make inferences > and associations based on your initial algorithms. > > Over time you improve your algorithms and find better ways of structuring > the relationships. What are the best ways to restructure the graph on a > live system? > > One approach would be to keep an event log for each user action and replay > it to rebuild the graph (http://martinfowler.com/bliki/MemoryImage.html). > You could keep two DBs and switch over to the new version. > > Or is it better to just prune the induced graph? > We discussed several approaches for this, and one approach specific to Neo4j-HA would be to run multiple graph versions by configuring "read-only" mirrors that have different algorithms/stored procedures -- a base graph that users write to that only stores raw/manual inputs, and one or more algorithmically-enhanced graphs that users read from. In the context of a Web application, this would allow you to test different versions of your algorithms on a subset of users and switch over to new versions as your algorithms improve. For this to work, you would at least need: * The ability to execute different triggers/stored procedures on the different "read-only" mirrors. * A way to push data to and augment the different derived graphs with data from external sources, such as Hadoop processes. * Either two sequence sources -- one for the manual elements and one for the algorithmically-enhanced elements -- or the ability for each graph to use the same sequence generator. NOTE: The "read only" mirrors are not really read-only because they can receive automated/algorithmic inputs, just not user/manual inputs. I added this to the Github issue tracker (https://github.com/neo4j/community/issues/11), and Tobias recommended posting it to the list for discussion. See also... "[Furnace] A Graph Algorithm-Based TinkerPop Project? -- or simply, DerivedGraph?" https://groups.google.com/d/topic/gremlin-users/G4NC0J-FAtQ/discussion Here is a Google research video on infrastructure and testing new algorithms... "Large Scale Search System Infrastructure and Search Quality" http://research.google.com/roundtable/LSS.html - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Restructuring-a-Neo4j-Production-Graph-tp3316771p3316771.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user