I've got an incredibly large database of 100M+ records and need to query it 
as efficiently as possible, hopefully utilizing the best advantages of 
OrientDB.

Any suggestions for how to add more vertexes to more quickly filter so that 
I can query more efficiently.

*The Data*

Think of a social network -- names, email address, physical addresses 
(street, city, state and zip) and other stuff like this including birthday.

*The Queries*

I want to query as quickly as possible for as many things as possible.

1. Email Optomization:

I can think that for email I would create vertexes with all the domains so 
that I could split the work if I have the domain and only then scan in the 
subset.  Maybe the edge could also have a copy of the name and have a 
unique index on that since it would be unique within the edge (if that's 
possible).

2.  Searching names:

I don't have any good ideas here, especially if you'd want to search with 
wildcards.  Maybe I can have vertexes with each first letter or first 3 
letters?  The scan will still be tens of millions of records.

3.  Mapping addresses:

How would you do longitude and latitude if you wanted, lets say, to get a 
list of nodes close to where you are in order to put addresses on a Google 
map?

It seems like you could create vertexes of every latitude and longitude to 
quickly get this list -- on small town, less than 10km wide, in California 
has 4000 people and a query for this takes 90 seconds on an 8-processor 
machine and SSD drive!!!

But I'm really only repeating the same technique.

*>> Are there more tools that I can use other than (pointer vertexes) and 
indexes?*

What don't I know?

Thanks in advance.


-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to