Re: [orientdb] Slow performance when building a weighted table out of edge-vertex relationships

Luca Garulli Fri, 24 Apr 2015 09:31:52 -0700

The bottleneck there is the 11k records that match the condition:

timestamp < date('2015-04-24 09:40:00')


Best Regards,

Luca Garulli
CEO at Orient Technologies LTD
the Company behind OrientDB
http://about.me/luca.garulli


On 24 April 2015 at 17:02, Jean-Sebastien Lemay <
[email protected]> wrote:

> Understood about X2 - - > Y2. However I wanted to confirm if the
> bottleneck was due to the system navigating the edge 6000 times to retrieve
> the "name" field from the same vertex.
>
> I'm afraid hooks are out of the question since my next query is to
> determine, for a specific user, which tags he used the most. I cannot start
> creating a table for each user for each day month year etc. Especially
> since I mean past [x] months from this instant, not a calendar month.
>
> Can you confirm if my query is slow due to the repeated re-fetching of the
> "name" field of my Tag vertex?
> On Apr 24, 2015 10:39 PM, "Luca Garulli" <[email protected]> wrote:
>
>> Hi Jean-Sebastien,
>> If you have X records in your resultset and execution time is Y, then
>> with X2 records it will be about Y2. There is not magic here.
>>
>> The most common solution, instead, is to pre-aggregate weight in a tree
>> (year->month->day->hour->minute->second),
>> so every query takes a constant time, no matter by the number of events.
>>
>> You can do this by writing a Hook that automatically updates counters
>> upon create/update/delete.
>>
>> In this way the query above would be super fast.
>>
>> --
>> Best Regards,
>>
>> Luca Garulli
>> CEO at Orient Technologies LTD
>> the Company behind OrientDB
>> http://about.me/luca.garulli
>>
>>
>> On 24 April 2015 at 13:13, Jean-Sebastien Lemay <
>> [email protected]> wrote:
>>
>>> To validate if OrientDB is the right fit for my project, I've created a
>>> simple scenario:
>>>
>>>    - Vertex classes (2):
>>>       - *User*: represents a user
>>>       - *Tag*: represents a tag that a user can interact with (e.g.
>>>       post a comment using that tag). Contains a *name *property
>>>
>>>       - Edge classes (1):
>>>       - *UserUsedTag*: connects a user to a tag. Contains a *timestamp 
>>> *property
>>>       (which is indexed with NOTUNIQUE)
>>>
>>> To sum it up:
>>> *[V:User] ---[E:UserUsedTag]--> [V:Tag]*
>>>
>>> I'm trying to build a query that will let me know which tags have been
>>> the most popular in the past [x] minutes/hours/days/months...
>>> As such, here is an example of the query I've got right now:
>>> SELECT inV().name as name, COUNT(in) AS weight
>>> FROM UserUsedTag
>>> WHERE timestamp < date('2015-04-24 09:40:00')
>>> GROUP BY in
>>> ORDER BY weight DESC
>>>
>>> The query works, and I get a proper result set:
>>> name
>>> weight
>>> baseball6117soccer5003My problem is the performance:
>>> *Query executed in 0.311 sec. Returned 2 record(s)*
>>> If it takes 1/3 of a second to sift through ~11,000 results, I can only
>>> imagine how crippled the performance will be if I am dealing with millions
>>> of edges, which I expect to end up with. In fact, as I add new edges, it
>>> seems like the query time increases linearly.
>>>
>>> Here is the EXPLAIN for the above query:
>>> METADATAPROPERTIES
>>> @version
>>> resultSize
>>> fullySortedByIndex
>>> documentAnalyzedCompatibleClass
>>> recordReads
>>> fetchingFromTargetElapsed
>>> indexIsUsedInOrderBy
>>> compositeIndexUsed
>>> current
>>> documentReads
>>> projectionElapsed
>>> limit
>>> orderByElapsed
>>> evaluated
>>> groupByElapsed
>>> user
>>> elapsed
>>> resultType
>>> involvedIndexes
>>> 02false1112011120313false1#14:11120
>>> <http://172.16.11.2:2480/studio/index.html#/database/TestOrientDb/browse/edit/14:11120>
>>> 111206-10111200#5:0
>>> <http://172.16.11.2:2480/studio/index.html#/database/TestOrientDb/browse/edit/5:0>
>>> 353.77362collection["UserUsedTag.timestamp"]It seems that
>>> 'fetchingFromTargetElapsed' is the biggest bottleneck here. Is this due to
>>> the nature of my query? Perhaps my query is not optimal? I am new to graph
>>> DBs so I'd like to know if there is any way I can rephrase my query to end
>>> up with the same result.
>>>
>>> The problem for me is that if I was using a relational DB, I would
>>> probably get better performance, if I was to query my "link" table and,
>>> with my final two (2) rows, perform one (1) query for each to retrieve the
>>> name of the corresponding Tag via the foreign key. Is there any way I can
>>> split my query similarly with OrientDB, instead of fetching the same Tag
>>> name multiple times?
>>>
>>>  --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>>
>>  --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "OrientDB" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/orient-database/rLBqzmENmcU/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Slow performance when building a weighted table out of edge-vertex relationships

Reply via email to