I'm in the process of working on a POC with OrientDB. I've set it up across 
3 servers. I read the OrientDB documentation and wanted to
know the best possible method to load the data which is in the form of CSV 
files. The schema having 3 class vertices and 3 class edges which should be
interconnected among one another.

Below are some of the questions i have :

1) Does it make sense in terms of ETL performance, if i create 3 clusters 
for each of the classes and assign each cluster to one of the servers ? ( 
based on this link 
: http://orientdb.com/docs/2.2.x/Distributed-Sharding.html  I'm not worried 
about fault tolerance at this stage )

2) Regarding the ETL storage process, i'm considering 3 options :

   - The ETL tool provided with OrientDB ( with all possible optimizations )
   - Utilizing OGraphBatchInsert
   - Storing in terms of document 
   ( http://orientdb.com/docs/2.2.x/Graph-Batch-Insert.html )

For the 2nd and 3rd method, I'm required to provide Record Ids manually, My 
doubt is how do i make sure Duplicate vertices are not created. Will 
Indexing help avoid this ?
How does the above 3 methods compare in terms of performance ?

3) Is it possible to store in one server of the OrientDB cluster within 
that machine using the "plocal" option in the ETL tool ?

4) Is it possible to use plocal option for ETL , even when the OrientDB 
runs on distributed mode ?


-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to