I've got an incredibly large database of 100M+ records and need to query it as efficiently as possible, hopefully utilizing the best advantages of OrientDB.
Any suggestions for how to add more vertexes to more quickly filter so that I can query more efficiently. *The Data* Think of a social network -- names, email address, physical addresses (street, city, state and zip) and other stuff like this including birthday. *The Queries* I want to query as quickly as possible for as many things as possible. 1. Email Optomization: I can think that for email I would create vertexes with all the domains so that I could split the work if I have the domain and only then scan in the subset. Maybe the edge could also have a copy of the name and have a unique index on that since it would be unique within the edge (if that's possible). 2. Searching names: I don't have any good ideas here, especially if you'd want to search with wildcards. Maybe I can have vertexes with each first letter or first 3 letters? The scan will still be tens of millions of records. 3. Mapping addresses: How would you do longitude and latitude if you wanted, lets say, to get a list of nodes close to where you are in order to put addresses on a Google map? It seems like you could create vertexes of every latitude and longitude to quickly get this list -- on small town, less than 10km wide, in California has 4000 people and a query for this takes 90 seconds on an 8-processor machine and SSD drive!!! But I'm really only repeating the same technique. *>> Are there more tools that I can use other than (pointer vertexes) and indexes?* What don't I know? Thanks in advance. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
