Re: [orientdb] Re: New to OrientDB: Object DB, Queries, Sorting and Performance

Luca Garulli Fri, 31 Jul 2015 06:33:12 -0700

Hi Sebastian,
Sorry to have seen this now, I hope it's not too late.

Starting from your last query (about 7 seconds):


select from RecordImpl
where
  type contains [#46:0]
and (
  relationNode.relations
  contains (
    predicate contains [#34:0] and subject contains [#30:18]
  )
)

I see the bottleneck is the expression: *relationNode.relations contains (
predicate contains [#34:0] and subject contains [#30:18] )*. In facts with
such expression OrientDB does a full scan of many records. You can try by
prefixing *EXPLAIN* to the query:

*explain* select from RecordImpl where type contains [#46:0] and
( relationNode.relations contains ( predicate contains [#34:0] and subject
contains [#30:18] ) )

The secret for fast queries is, in any DBMS, using indexes as much as you
can. When you use the dot notation (.) OrientDB can't use the indexes. By
reading the original query:

select from RecordImpl
where
  type.uniqueKey = 'TOME'
and
  relationNode.relations
  contains (
   (predicate.uniqueKey = 'IS_PUB_PLACE_OF')
    and
    subject.relationNodeContainers contains (uniqueKey = 'MILANO')
  )

You have 3 conditions to match. If you'd use the Graph API you'd have
bidirectional edges, so you can start from any point in the graph and cross
in any direction. For example you can lookup for all the place of type "
IS_PUB_PLACE_OF" and start crossing the graph matching the other
conditions. Or you could do the same with "MILANO".

To help you more I'd need the schema of the entities involved in this query.

Best Regards,

Luca Garulli
Founder & CEO
OrientDB <http://orientdb.com/>


On 31 July 2015 at 12:46, Zapp El <[email protected]> wrote:

>
> So, we officially gave up on OrientDB.
>
> Performance is just too bad for our use-case. We did try a couple of
> things more, but none of them helped. Even with a very small amount of Data
> (2 GB, ~ 4,859,173 Records), Performance is abysmal.
>
> That is really a pity. I totally like the basic concept of OrientDBs
> ObjectDB.
>
> Since I work with several hundred GBs of index data with Lucene and Solr
> on a daily base and never had a problem to achieve a decent performance my
> only guess right now is that OrientDBs ObjectDB suffers from a poorly
> written query optimization.
>
> And BTW, we really hoped that one of the developers would chime in here.
> In the end, we are potential customers, but as long as we can't get a
> basic proof of concept going or at least get confirmation, that our data
> isn't a complete mismatch for OrientDB, we won't buy any licences.
>
> Best regards,
>
> Sebastian
>
>
>
>
> On Tuesday, July 28, 2015 at 7:13:06 PM UTC+2, Zapp El wrote:
>>
>> Hello Community,
>> hello OrientDB developers!
>>
>> I work in public services, and currently we're evaluating different
>> technologies (OrientDB, Hibernate/postgres, Fedora4 and neo4j) in order to
>> find the best possible backend-solution for our data.
>> Over the course of the last month we developed a fairly straight-forward
>> Java-Class-Model that we like to use regardless of the underlying
>> technology.
>>
>> In future applications we're going to have more than 1,5 Mill. objects to
>> persist, manage and retrieve.
>>
>> Handling Java-Objects directly seems so much more intuitive and flexible
>> instead of mapping them with JPA, so we were eager to try something new,
>> like for example, OrientDBs Object Database functionalities.
>>
>> But somehow we can't figure out how to get a decent performance out of
>> our experimental setup.
>>
>> So far we've persisted about 95,601 of our Objects (books and other
>> media), resulting in
>> 46,995,663 ORecords (see screen-shot) and about 18,8 GB of data on our
>> NAS.
>>   Our test-system:
>>
>> - Virtual Machine on VMware
>> - SUSE 10 OS
>> - 1 TB of NAS.
>> - Java 1.8
>> - OrientDD PE Version 2.0.12
>> - QuadCore CPU
>>
>>
>>
>> Select a book with a specific relation (like a triple):
>>
>> select from RecordImpl
>> where
>>   type.uniqueKey = 'TOME'
>> and
>>   relationNode.relations
>>   contains (
>>    (predicate.uniqueKey = 'IS_PUB_PLACE_OF')
>>     and
>>     subject.relationNodeContainers contains (uniqueKey = 'MILANO')
>>   )
>> Query executed in 9.26 sec. Returned 20 record(s)
>>
>> 9.26 sec. , how can we accelerate this query? We have indexes on all the
>> uniqueKeys.
>>
>>
>>
>> We managed to accelerate this query a little by rewriting the statement
>> like this:
>>
>> select from RecordImpl
>> where
>>   type contains [#46:0]
>> and (
>>   relationNode.relations
>>   contains (
>>     predicate contains [#34:0] and subject contains [#30:18]
>>   )
>> )
>> Query executed in 7.122 sec. Returned 20 record(s)
>>
>>  7.122 sec. , sadly not acceptable. And that is one of the more simple
>> questions we'd like to get answered in a decent time.
>>
>>
>>
>> Now this one with a simple order by:
>>
>> select from RecordImpl
>> where
>>   type contains [#46:0]
>> and (
>>   relationNode.relations
>>   contains (
>>     predicate contains [#34:0] and subject contains [#30:18]
>>   )
>> )
>> order by sortIndex desc
>> Query executed in 133.423 sec. Returned 20 record(s)
>>
>>
>>
>> So, any ideas how we could accelerate our queries? What do we wrong?
>>
>>
>> Best regards & thanks,
>>
>> Sebastian
>>
>>
>> Edit: Added number of cores (4) at sys specs
>>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: New to OrientDB: Object DB, Queries, Sorting and Performance

Reply via email to