Re: [orientdb] Slow performance when building a weighted table out of edge-vertex relationships

Jean-Sebastien Lemay Mon, 27 Apr 2015 08:24:30 -0700

Hi, thanks for looking into this situation, it's really appreciated!

I repopulated my database and ran your query:
SELECT inV().name AS name, COUNT(*) AS weight FROM (
  SELECT expand(outE("UserUsedTag")) 
  FROM user
  WHERE username = "john.doe"
) 
WHERE timestamp > date('2015-04-27 15:13:00')
AND timestamp < date('2015-04-27 15:17:00')
GROUP BY in
ORDER BY weight DESC
Query executed in 0.474 sec. Returned 2 record(s)
baseball 5013
soccer 4987


Here is what the EXPLAIN returns:
METADATAPROPERTIES
@version
resultSize
current
recordReads
rangeQueryConvertedInBetween
limit
fetchingFromTargetElapsed
projectionElapsed
documentReads
orderByElapsed
evaluated
groupByElapsed
user
elapsed
resultType
expandElapsed
02#14:24599 
<http://172.16.11.2:2480/studio/index.html#/database/TestOrientDb/browse/edit/14:24599>
100001-12727100000100000#5:0 
<http://172.16.11.2:2480/studio/index.html#/database/TestOrientDb/browse/edit/5:0>
445.29556collection3
My schema is as follows:

<https://lh3.googleusercontent.com/-CtHv7envQbM/VT5T0kmvKbI/AAAAAAAAANs/xXG5AoAUKoU/s1600/schema_user.jpg>
(User extends V, UNIQUE_HASH_INDEX on User.username)

<https://lh3.googleusercontent.com/-8COWToHTE7Y/VT5UAs0WZBI/AAAAAAAAAN0/1vnRx2jtBaU/s1600/schema_tag.jpg>
(Tag extends V, UNIQUE_HASH_INDEX on Tag.name)

<https://lh3.googleusercontent.com/-Ma25A-g4A6k/VT5UIcKtBpI/AAAAAAAAAN8/nxipEy7_0fU/s1600/schema_user_used_tag.jpg>

(UserUsedTag extends E, NOTUNIQUE index on UserUsedTag.timestamp)


Let me know if you can determine what causes my query to take almost half a 
second. Hopefully we uncover something I'm doing wrong -- the database is 
great beyond this particular issue.
Best regards,
Jean



On Monday, April 27, 2015 at 6:21:04 PM UTC+8, Izzet Pembeci wrote:
>
> This is surprizing indeed. Can't you put the timestamp where clause just 
> before outer group by so that you eliminate one subquery:
>
> SELECT inV().name AS name, COUNT(*) AS weight FROM (
>
>   SELECT expand(outE("UserUsedTag")) 
>   FROM user
>   WHERE username = "john.doe"
>  ) 
>
> WHERE timestamp > date('2015-04-25 05:00:22')
>  AND timestamp < date('2015-04-25 05:01:10')
> GROUP BY in
> ORDER BY weight DESC
>
> What does the EXPLAIN return for this latest one and can you share your 
> index definitions (may be there is a problem with those)?
>
> iZzeT
>
>
> On Sunday, April 26, 2015 at 4:58:11 PM UTC+3, Jean-Sebastien Lemay wrote:
>>
>> Hi Luca,
>>
>> Thanks for the advice. However in your query I am unable to filter by 
>> timestamp on the edge. Instead, I ended up with:
>> SELECT inV().name AS name, COUNT(*) AS weight FROM (
>>
>>  SELECT FROM (
>>   SELECT expand(outE("UserUsedTag")) 
>>   FROM user
>>  WHERE username = "john.doe"
>>  ) WHERE timestamp > date('2015-04-25 05:00:22')
>>  AND timestamp < date('2015-04-25 05:01:10')
>> )
>> GROUP BY in
>> ORDER BY weight DESC
>> soccer 5020
>> baseball 4939
>> Query executed in 0.468 sec. Returned 2 record(s)
>>
>> The above query seems to take into account all indexes (index on 
>> User.username and UserUsedTag.timestamp).
>>
>> Still, 468ms for ~10k records. Am I correct then that OrientDB in the end 
>> is a tradeoff, i.e. it's a lot more developer friendly and has tons of 
>> features, but at the cost of rougly ~40x the performance of a traditional 
>> relational DB such as PostgreSQL?
>>
>>
>>
>> On Sunday, April 26, 2015 at 6:33:38 PM UTC+8, l.garulli wrote:
>>>
>>> On 25 April 2015 at 16:41, Jean-Sebastien Lemay <[email protected]
>>> > wrote:
>>>
>>>> Hi there,
>>>>
>>>
>>> Hi Jean-Sebastien
>>>  
>>>
>>>>
>>>> OK -- I'm a bit surprised, in a way; this is a straightforward SELECT 
>>>> statement.
>>>>
>>>> I decided to run another experiment. My scenario: determine which tags 
>>>> are a given user's favourite.
>>>>
>>>> My query in OrientDB:
>>>> SELECT in.name AS name, COUNT(in.name) AS weight 
>>>> FROM UserUsedTag
>>>> WHERE "john.doe" IN out.username
>>>> AND timestamp < date()
>>>> GROUP BY in.name
>>>> ORDER BY weight DESC
>>>> Query executed in 0.487 sec. Returned 2 record(s)
>>>> soccer 5042
>>>> baseball 4958
>>>>
>>>
>>> This is not the most efficient way to do that, because the index on 
>>> username wouldn't be used. With a Graph Database, you should lookup at the 
>>> stating vertex and then cross relationships. Try this:
>>>
>>> select name, count(*) as weight
>>> from (
>>>   select expand( in() ) from User where username = "john.doe"
>>> ) group by name, order by weight
>>>
>>> Assure to have an index on User.username. I'd suggest a Unique index of 
>>> type HASH_INDEX.
>>>
>>> Lvc@
>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Slow performance when building a weighted table out of edge-vertex relationships

Reply via email to