I thought this might be interesting to some. I decided to compare the use 
of two indexing options: UNIQUE_HASH_INDEX and UNIQUE. The migration script 
performs a query that inserts iff the record doesn't already exist, in 
which case it updates it. For example,

UPDATE User
SET createdAt="2013-11-17T13:15:10.181Z",
updatedAt="2014-04-10T11:54:18.699Z",
objectId="qCGmZB1KSx",
ACL={"*":{"read":true},"qCGmZB1KSx":{"read":true,"write":true}}
UPSERT WHERE objectId="qCGmZB1KSx"

The rows have much more data than that, making them heavy documents.

First, I set up two database classes and their indexes:

CREATE CLASS ParseObject ABSTRACT
CREATE PROPERTY ParseObject.objectId STRING

CREATE CLASS User EXTENDS ParseObject
CREATE INDEX User.objectId UNIQUE_HASH_INDEX

CREATE CLASS User2 EXTENDS ParseObject
CREATE INDEX User2.objectId UNIQUE

I then ran a migration script that imported >1M user rows, 1000 at a time, 
timing the database transactions, and logged at the following batch numbers:
1, 2, 3, 10, 50, 100, 500, 1000, fin
(Looking back, I should have modified the script to do this automatically, 
as this didn't work out so well.)

The results were very close, but *UNIQUE_HASH_INDEX* won. After batch 500 
(~500k records)...
UNIQUE_HASH_INDEX took 0.377s on average to UPSERT 1000 documents
UNIQUE took 0.386s on average to UPSERT 1000 documents



I accidentally closed my computer, so the second batch started after ~800k 
records were inserted, but the difference only grew. By the end,
UNIQUE_HASH_INDEX took 0.400s on average to UPSERT 1000 documents
UNIQUE took 0.460s on average to UPSERT 1000 documents
After over 1M rows. Thats a difference of *60 milliseconds per 1000 finds 
and inserts*, on average.

This is very interesting to me. Of course, this isn't a billion or even 10 
million records, but it's enough for me to choose UNIQUE_HASH_INDEX for 
non-range indices.

Below is the full dump, if you're interested.

=================
> =================
> =================
> UNIQUE_HASH_INDEX
> =================
> =================
> =================
>
> ========== [0.351s]
> BATCH #1 [0s]
> ========== [0s]
> SUBTOTAL: 1000 (1000 CREATED, 0 UPDATED, 0 TIMEOUTS, 0 ERRORS) [0s]
> TOTAL TIME: 2.647s (0.351s DB, 2.296s LATENCY) [0s]
> AVG TIME: 2.647s/batch (0.351s DB, 2.296s LATENCY) [0s]
> ========== [0.334s]
> BATCH #2 [0s]
> ========== [0s]
> SUBTOTAL: 2000 (2000 CREATED, 0 UPDATED, 0 TIMEOUTS, 0 ERRORS) [0s]
> TOTAL TIME: 4.91s (0.685s DB, 4.225s LATENCY) [0s]
> AVG TIME: 2.455s/batch (0.3425s DB, 2.1125s LATENCY) [0s]
> ========== [0.356s]
> BATCH #3 [0s]
> ========== [0s]
> SUBTOTAL: 3000 (3000 CREATED, 0 UPDATED, 0 TIMEOUTS, 0 ERRORS) [0s]
> TOTAL TIME: 6.797s (1.041s DB, 5.756s LATENCY) [0s]
> AVG TIME: 2.2656666666666667s/batch (0.347s DB, 1.9186666666666667s 
> LATENCY) [0.001s]
> ========== [0.33s]
> BATCH #10 [0s]
> ========== [0s]
> SUBTOTAL: 9999 (9999 CREATED, 0 UPDATED, 0 TIMEOUTS, 1 ERRORS) [0s]
> TOTAL TIME: 24.919s (3.3569999999999998s DB, 21.562s LATENCY) [0s]
> AVG TIME: 2.4919000000000002s/batch (0.3357s DB, 2.1562s LATENCY) [0s]
> ========== [0.354s]
> BATCH #50 [0s]
> ========== [0s]
> SUBTOTAL: 49998 (49998 CREATED, 0 UPDATED, 0 TIMEOUTS, 2 ERRORS) [0s]
> TOTAL TIME: 132.25s (17.090999999999998s DB, 115.159s LATENCY) [0s]
> AVG TIME: 2.645s/batch (0.34181999999999996s DB, 2.3031800000000002s 
> LATENCY) [0s]
> ========== [0.385s]
> BATCH #100 [0s]
> ========== [0s]
> SUBTOTAL: 99995 (99995 CREATED, 0 UPDATED, 0 TIMEOUTS, 5 ERRORS) [0s]
> TOTAL TIME: 289.742s (35.352s DB, 254.39000000000001s LATENCY) [0s]
> AVG TIME: 2.8974200000000003s/batch (0.35351999999999995s DB, 
> 2.5439000000000003s LATENCY) [0s]
> ========== [0.371s]
> BATCH #500 [0s]
> ========== [0s]
> SUBTOTAL: 499777 (499777 CREATED, 0 UPDATED, 1 TIMEOUTS, 223 ERRORS) [0s]
> TOTAL TIME: 1472.485s (188.42100000000016s DB, 1284.0639999999999s 
> LATENCY) [0s]
> AVG TIME: 2.9449699999999996s/batch (0.37684200000000034s DB, 
> 2.5681279999999997s LATENCY) [0s]
> ========== [0.024s]
> BATCH #765 [0s]
> ========== [0s]
> SUBTOTAL: 764707 (764707 CREATED, 0 UPDATED, 1 TIMEOUTS, 293 ERRORS) [0s]
> TOTAL TIME: 2251.022s (288.5290000000002s DB, 1962.4929999999997s LATENCY) 
> [0s]
> AVG TIME: 2.9425124183006535s/batch (0.3771620915032683s DB, 
> 2.565350326797385s LATENCY) [0s]
>
> *** computer locked, script restarted
>
> ========== [0.139s]
> BATCH #556 [0s]
> ========== [0s]
> SUBTOTAL: 555489 (555489 CREATED, 0 UPDATED, 1 TIMEOUTS, 161 ERRORS) [0s]
> TOTAL TIME: 2434.714s (222.20600000000007s DB, 2212.508s LATENCY) [0s]
> AVG TIME: 4.378982014388489s/batch (0.3996510791366908s DB, 
> 3.9793309352517983s LATENCY) [0s]
> Grabbing the next batch of Parse _User updated after 
> 2014-07-08T20:09:24.682Z [0s]
> Found 626 updated objects [1.889s]
>
>
>
>
>
> ======
> ======
> ======
> UNIQUE
> ======
> ======
> ======
>
> ========== [0.334s]
> BATCH #1 [0s]
> ========== [0s]
> SUBTOTAL: 1000 (1000 CREATED, 0 UPDATED, 0 TIMEOUTS, 0 ERRORS) [0s]
> TOTAL TIME: 1.794s (0.334s DB, 1.46s LATENCY) [0s]
> AVG TIME: 1.794s/batch (0.334s DB, 1.46s LATENCY) [0s]
> ========== [0.345s]
> BATCH #2 [0s]
> ========== [0s]
> SUBTOTAL: 2000 (2000 CREATED, 0 UPDATED, 0 TIMEOUTS, 0 ERRORS) [0s]
> TOTAL TIME: 3.987s (0.6779999999999999s DB, 3.309s LATENCY) [0s]
> AVG TIME: 1.9935s/batch (0.33899999999999997s DB, 1.6545s LATENCY) [0s]
> ========== [0.35s]
> BATCH #3 [0s]
> ========== [0s]
> SUBTOTAL: 3000 (3000 CREATED, 0 UPDATED, 0 TIMEOUTS, 0 ERRORS) [0.001s]
> TOTAL TIME: 6.057s (1.028s DB, 5.029s LATENCY) [0s]
> AVG TIME: 2.019s/batch (0.3426666666666667s DB, 1.6763333333333332s 
> LATENCY) [0s]
> ========== [0.336s]
> BATCH #10 [0s]
> ========== [0s]
> SUBTOTAL: 9999 (9999 CREATED, 0 UPDATED, 0 TIMEOUTS, 1 ERRORS) [0s]
> TOTAL TIME: 23.79s (3.3780000000000006s DB, 20.412s LATENCY) [0s]
> AVG TIME: 2.379s/batch (0.33780000000000004s DB, 2.0412s LATENCY) [0s]
>
> *** oops, i forgot to check at 50 and 100
>
> ========== [0.397s]
> BATCH #332 [0s]
> ========== [0s]
> SUBTOTAL: 331978 (331978 CREATED, 0 UPDATED, 0 TIMEOUTS, 22 ERRORS) [0s]
> TOTAL TIME: 1006.782s (127.84200000000014s DB, 878.9399999999999s LATENCY) 
> [0s]
> AVG TIME: 3.0324759036144577s/batch (0.3850662650602414s DB, 
> 2.6474096385542167s LATENCY) [0s]
>
> ========== [0.365s]
> BATCH #500 [0s]
> ========== [0s]
> SUBTOTAL: 499777 (499777 CREATED, 0 UPDATED, 2 TIMEOUTS, 223 ERRORS) [0s]
> TOTAL TIME: 1552.584s (193.07500000000002s DB, 1359.509s LATENCY) [0s]
> AVG TIME: 3.105168s/batch (0.38615000000000005s DB, 2.719018s LATENCY) [0s]
> ========== [0.106s]
> BATCH #765 [0s]
> ========== [0s]
> SUBTOTAL: 764706 (764706 CREATED, 0 UPDATED, 3 TIMEOUTS, 293 ERRORS) [0s]
> TOTAL TIME: 2354.76s (295.81599999999963s DB, 2058.9440000000004s LATENCY) 
> [0s]
> AVG TIME: 3.0781176470588236s/batch (0.3866875816993459s DB, 
> 2.691430065359478s LATENCY) [0s]
>
> *** restarted script like previous test
>
> ========== [0.567s]
> BATCH #527 [0s]
> ========== [0s]
> SUBTOTAL: 526624 (526624 CREATED, 0 UPDATED, 0 TIMEOUTS, 155 ERRORS) [0s]
> TOTAL TIME: 2245.327s (242.64300000000034s DB, 2002.684s LATENCY) [0s]
> AVG TIME: 4.260582542694498s/batch (0.460423149905124s DB, 
> 3.8001593927893738s LATENCY) [0s]

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to