I just want to share my results of Indexing huge amounts and data and trying to discuss it with you. For most of you this might be clear. But I already read about indexing at the end. So this might be a good point to talk about the result.
The data: - 1,8 GB CSV file - 30.000.000 lines - 5 columns - first column is person_id which shall be the only indexed one in the example So now my tests: 1. First I set Schema and Index on the person_id and loaded the 30 mio data. It took *~20 min* 2. Then I was thinking about creating schema and index after the loading. So it was completely schema-less. It took also* ~19,5 min*. Then I tried to create an index I got the *error *that he was thinking I inserted a String. Well I think this is strange but yeah default is String. So that error is fine. 3. So in the third test I set the schema on person_id. Loading data was again *~20 min*. And then set an index on that property. It took "2014-08-31 11:06:24:487 INFO --> OK, indexed 30,000,000 items in 666,502 ms". -> 11 min. -> *31 min* So my results: - setup the Schema and index before loading any data - be sure about the types of properties If I missed anything just add it. And if you have an idea why 2. and 3. are notworking/so slow also leave a comment. Thanks. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
