Peter, The project requires quick retrieval of data bases upon certain parameters, which, without indexing, would not be feasible (as the relations that hold the data are all same, so simple traversals won't work). As I said earlier, I would have to extract data based upon combination of some parameters, which I could easily do by indexing the same data on different parameters and taking advantage of the lucene's queryparser.
Regarding those 50k users, they will be making much use of the C and R or the CRUD (Reads will be the most used though). I estimate that at any time, 30-50% of them would be using the project. Suppose a user generates 20 new nodes and 20 new relationships (not relationship types) per day. I would not index the data that they're posting, but the node number, so that I get to node with less memory usage. That seems efficient to me because I may make use of more number of nodes, but I get a smaller Index. To index a node X ( with some data in it), I can index node Y (empty) that has a direct relation with X. The nodes are getting exhausted, but atleast it gives me a smaller and faster index (and I have virtually unlimited number of nodes with Neo4j). (I know this scheme might seem a little vague because I am wasting nodes, and someday when the scalability factor kicks in, I might have to rectify this!) Any suggestions regarding the same? Also, my database would need store mostly strings. What about putting up another layer in front of my neo4j db that maps those strings to ints so that I could index those easily in neo4j? Example of this can be: This addictional layer can map an email address (which I need to index) to a unique user id which I can index using Neo4j _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user