Hi Jean-Sebastien,
If you have X records in your resultset and execution time is Y, then with
X2 records it will be about Y2. There is not magic here.

The most common solution, instead, is to pre-aggregate weight in a tree
(year->month->day->hour->minute->second),
so every query takes a constant time, no matter by the number of events.

You can do this by writing a Hook that automatically updates counters upon
create/update/delete.

In this way the query above would be super fast.

-- 
Best Regards,

Luca Garulli
CEO at Orient Technologies LTD
the Company behind OrientDB
http://about.me/luca.garulli


On 24 April 2015 at 13:13, Jean-Sebastien Lemay <
[email protected]> wrote:

> To validate if OrientDB is the right fit for my project, I've created a
> simple scenario:
>
>    - Vertex classes (2):
>       - *User*: represents a user
>       - *Tag*: represents a tag that a user can interact with (e.g. post
>       a comment using that tag). Contains a *name *property
>
>       - Edge classes (1):
>       - *UserUsedTag*: connects a user to a tag. Contains a *timestamp 
> *property
>       (which is indexed with NOTUNIQUE)
>
> To sum it up:
> *[V:User] ---[E:UserUsedTag]--> [V:Tag]*
>
> I'm trying to build a query that will let me know which tags have been the
> most popular in the past [x] minutes/hours/days/months...
> As such, here is an example of the query I've got right now:
> SELECT inV().name as name, COUNT(in) AS weight
> FROM UserUsedTag
> WHERE timestamp < date('2015-04-24 09:40:00')
> GROUP BY in
> ORDER BY weight DESC
>
> The query works, and I get a proper result set:
> name
> weight
> baseball6117soccer5003My problem is the performance:
> *Query executed in 0.311 sec. Returned 2 record(s)*
> If it takes 1/3 of a second to sift through ~11,000 results, I can only
> imagine how crippled the performance will be if I am dealing with millions
> of edges, which I expect to end up with. In fact, as I add new edges, it
> seems like the query time increases linearly.
>
> Here is the EXPLAIN for the above query:
> METADATAPROPERTIES
> @version
> resultSize
> fullySortedByIndex
> documentAnalyzedCompatibleClass
> recordReads
> fetchingFromTargetElapsed
> indexIsUsedInOrderBy
> compositeIndexUsed
> current
> documentReads
> projectionElapsed
> limit
> orderByElapsed
> evaluated
> groupByElapsed
> user
> elapsed
> resultType
> involvedIndexes
> 02false1112011120313false1#14:11120
> <http://172.16.11.2:2480/studio/index.html#/database/TestOrientDb/browse/edit/14:11120>
> 111206-10111200#5:0
> <http://172.16.11.2:2480/studio/index.html#/database/TestOrientDb/browse/edit/5:0>
> 353.77362collection["UserUsedTag.timestamp"]It seems that
> 'fetchingFromTargetElapsed' is the biggest bottleneck here. Is this due to
> the nature of my query? Perhaps my query is not optimal? I am new to graph
> DBs so I'd like to know if there is any way I can rephrase my query to end
> up with the same result.
>
> The problem for me is that if I was using a relational DB, I would
> probably get better performance, if I was to query my "link" table and,
> with my final two (2) rows, perform one (1) query for each to retrieve the
> name of the corresponding Tag via the foreign key. Is there any way I can
> split my query similarly with OrientDB, instead of fetching the same Tag
> name multiple times?
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to