Re: [orientdb] Re: New to OrientDB: Object DB, Queries, Sorting and Performance

Zapp El Fri, 31 Jul 2015 10:18:17 -0700

Hi Luca,

thanks for your response.


Yeah it's kinda too late now. Hibernate/postgres guys won. This time, at 
least. We had to decide today because of a current project. 
But for future projects, jury is still out. We still want and we have to 
build a large, flexible network of related information for our Library. 

Regarding your response, I'm confused and terrified at the same time. But 
at least now I know that I know nothing about graph databases. (lol)

Since I had zero experience with graph databases the whole concept with 
these links and navigating through the data with "java-like access-paths" 
(like accessing java-class properties) felt super natural to me. 
I was able to write queries in a few days without much learning. 

I experimented a little with TRAVERS and I was able to build a query which 
performs way better than the others:
select from (
        traverse * from (
          select from RelationImpl where predicate.uniqueKey = 
'IS_PUB_PLACE_OF' 
          and subject.relationNodeContainers contains (uniqueKey = 'MILANO')
        ) while $depth <= 3
    )
) where @class = 'RecordImpl' and type.uniqueKey = 'TOME'
order by sortIndex asc

Less than 2 seconds with order by, compared to 14 seconds for the first 
query I've posted. 
TBH, I have no idea what kind of black magic I've done there. 

Anyways, we have to move on for now.  

Best regards & thanks again,

Sebastian



On Friday, July 31, 2015 at 3:32:41 PM UTC+2, l.garulli wrote:
>
> Hi Sebastian,
> Sorry to have seen this now, I hope it's not too late.
>
> Starting from your last query (about 7 seconds):
>
> select from RecordImpl
> where 
>   type contains [#46:0]
> and (
>   relationNode.relations 
>   contains (
>     predicate contains [#34:0] and subject contains [#30:18]
>   ) 
> )
>
> I see the bottleneck is the expression: *relationNode.relations contains 
> ( predicate contains [#34:0] and subject contains [#30:18] )*. In facts 
> with such expression OrientDB does a full scan of many records. You can try 
> by prefixing *EXPLAIN* to the query:
>
> *explain* select from RecordImpl where type contains [#46:0] and 
> ( relationNode.relations contains ( predicate contains [#34:0] and subject 
> contains [#30:18] ) )
>
> The secret for fast queries is, in any DBMS, using indexes as much as you 
> can. When you use the dot notation (.) OrientDB can't use the indexes. By 
> reading the original query:
>
> select from RecordImpl
> where 
>   type.uniqueKey = 'TOME'
> and 
>   relationNode.relations 
>   contains (
>    (predicate.uniqueKey = 'IS_PUB_PLACE_OF') 
>     and 
>     subject.relationNodeContainers contains (uniqueKey = 'MILANO')
>   )
>
> You have 3 conditions to match. If you'd use the Graph API you'd have 
> bidirectional edges, so you can start from any point in the graph and cross 
> in any direction. For example you can lookup for all the place of type "
> IS_PUB_PLACE_OF" and start crossing the graph matching the other 
> conditions. Or you could do the same with "MILANO".
>
> To help you more I'd need the schema of the entities involved in this 
> query.
>
> Best Regards,
>
> Luca Garulli
> Founder & CEO
> OrientDB <http://orientdb.com/>
>
>
> On 31 July 2015 at 12:46, Zapp El <[email protected] <javascript:>> 
> wrote:
>
>>
>> So, we officially gave up on OrientDB.
>>
>> Performance is just too bad for our use-case. We did try a couple of 
>> things more, but none of them helped. Even with a very small amount of Data 
>> (2 GB, ~ 4,859,173 Records), Performance is abysmal. 
>>
>> That is really a pity. I totally like the basic concept of OrientDBs 
>> ObjectDB. 
>>
>> Since I work with several hundred GBs of index data with Lucene and Solr 
>> on a daily base and never had a problem to achieve a decent performance my 
>> only guess right now is that OrientDBs ObjectDB suffers from a poorly 
>> written query optimization. 
>>
>> And BTW, we really hoped that one of the developers would chime in here. 
>> In the end, we are potential customers, but as long as we can't get a 
>> basic proof of concept going or at least get confirmation, that our data 
>> isn't a complete mismatch for OrientDB, we won't buy any licences. 
>>
>> Best regards,
>>
>> Sebastian
>>
>>
>>
>>
>> On Tuesday, July 28, 2015 at 7:13:06 PM UTC+2, Zapp El wrote:
>>>
>>> Hello Community, 
>>> hello OrientDB developers!
>>>
>>> I work in public services, and currently we're evaluating different 
>>> technologies (OrientDB, Hibernate/postgres, Fedora4 and neo4j) in order to 
>>> find the best possible backend-solution for our data. 
>>> Over the course of the last month we developed a fairly straight-forward 
>>> Java-Class-Model that we like to use regardless of the underlying 
>>> technology. 
>>>
>>> In future applications we're going to have more than 1,5 Mill. objects 
>>> to persist, manage and retrieve.
>>>
>>> Handling Java-Objects directly seems so much more intuitive and flexible 
>>> instead of mapping them with JPA, so we were eager to try something new, 
>>> like for example, OrientDBs Object Database functionalities.  
>>>
>>> But somehow we can't figure out how to get a decent performance out of 
>>> our experimental setup. 
>>>
>>> So far we've persisted about 95,601 of our Objects (books and other 
>>> media), resulting in 
>>> 46,995,663 ORecords (see screen-shot) and about 18,8 GB of data on our 
>>> NAS. 
>>>   Our test-system: 
>>>
>>> - Virtual Machine on VMware 
>>> - SUSE 10 OS
>>> - 1 TB of NAS. 
>>> - Java 1.8 
>>> - OrientDD PE Version 2.0.12
>>> - QuadCore CPU
>>>
>>>
>>>
>>> Select a book with a specific relation (like a triple): 
>>>
>>> select from RecordImpl
>>> where 
>>>   type.uniqueKey = 'TOME'
>>> and 
>>>   relationNode.relations 
>>>   contains (
>>>    (predicate.uniqueKey = 'IS_PUB_PLACE_OF') 
>>>     and 
>>>     subject.relationNodeContainers contains (uniqueKey = 'MILANO')
>>>   )
>>> Query executed in 9.26 sec. Returned 20 record(s)      
>>>
>>> 9.26 sec. , how can we accelerate this query? We have indexes on all 
>>> the uniqueKeys. 
>>>
>>>
>>>
>>> We managed to accelerate this query a little by rewriting the statement 
>>> like this:
>>>
>>> select from RecordImpl
>>> where 
>>>   type contains [#46:0]
>>> and (
>>>   relationNode.relations 
>>>   contains (
>>>     predicate contains [#34:0] and subject contains [#30:18]
>>>   ) 
>>> )
>>> Query executed in 7.122 sec. Returned 20 record(s) 
>>>
>>>  7.122 sec. , sadly not acceptable. And that is one of the more simple 
>>> questions we'd like to get answered in a decent time. 
>>>
>>>
>>>
>>> Now this one with a simple order by: 
>>>
>>> select from RecordImpl
>>> where 
>>>   type contains [#46:0]
>>> and (
>>>   relationNode.relations 
>>>   contains (
>>>     predicate contains [#34:0] and subject contains [#30:18]
>>>   ) 
>>> )
>>> order by sortIndex desc
>>> Query executed in 133.423 sec. Returned 20 record(s) 
>>>
>>>
>>>
>>> So, any ideas how we could accelerate our queries? What do we wrong? 
>>>
>>>
>>> Best regards & thanks,
>>>
>>> Sebastian 
>>>              
>>>
>>> Edit: Added number of cores (4) at sys specs
>>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: New to OrientDB: Object DB, Queries, Sorting and Performance

Reply via email to