I just want to share my results of Indexing huge amounts and data and 
trying to discuss it with you. For most of you this might be clear. But I 
already read about indexing at the end. So this might be a good point to 
talk about the result.

The data:

   - 1,8 GB CSV file
   - 30.000.000 lines
   - 5 columns
   - first column is person_id which shall be the only indexed one in the 
   example
   
So now my tests:

   1. First I set Schema and Index on the person_id and loaded the 30 mio 
   data. It took *~20 min*
   2. Then I was thinking about creating schema and index after the 
   loading. So it was completely schema-less. It took also* ~19,5 min*. 
   Then I tried to create an index I got the *error *that he was thinking I 
   inserted a String. Well I think this is strange but yeah default is String. 
   So that error is fine.
   3. So in the third test I set the schema on person_id. Loading data was 
   again *~20 min*. And then set an index on that property. It took 
   "2014-08-31 11:06:24:487 INFO --> OK, indexed 30,000,000 items in 666,502 
   ms". -> 11 min. -> *31 min*
   
So my results:

   - setup the Schema and index before loading any data
   - be sure about the types of properties

If I missed anything just add it. And if you have an idea why 2. and 3. are 
notworking/so slow also leave a comment.

Thanks.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to