I'm in the process of working on a POC with OrientDB. I've set it up across 3 servers. I read the OrientDB documentation and wanted to know the best possible method to load the data which is in the form of CSV files. The schema having 3 class vertices and 3 class edges which should be interconnected among one another.
Below are some of the questions i have : 1) Does it make sense in terms of ETL performance, if i create 3 clusters for each of the classes and assign each cluster to one of the servers ? ( based on this link : http://orientdb.com/docs/2.2.x/Distributed-Sharding.html I'm not worried about fault tolerance at this stage ) 2) Regarding the ETL storage process, i'm considering 3 options : - The ETL tool provided with OrientDB ( with all possible optimizations ) - Utilizing OGraphBatchInsert - Storing in terms of document ( http://orientdb.com/docs/2.2.x/Graph-Batch-Insert.html ) For the 2nd and 3rd method, I'm required to provide Record Ids manually, My doubt is how do i make sure Duplicate vertices are not created. Will Indexing help avoid this ? How does the above 3 methods compare in terms of performance ? 3) Is it possible to store in one server of the OrientDB cluster within that machine using the "plocal" option in the ETL tool ? 4) Is it possible to use plocal option for ETL , even when the OrientDB runs on distributed mode ? -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
