Re: [orientdb] Embedded List Query Performance In OrientDB

Andrey Lomakin Fri, 11 Apr 2014 03:16:24 -0700

HI,
Could you try now ?


On Tue, Apr 8, 2014 at 5:13 PM, Wise Jack <[email protected]> wrote:

> Hi, Andrey:
>
> Sure. I'll send you a sample document of the database, I can't send the
> whole database to you since it's too large:
>
> This is a sample record of the database, I'm immigrating a chemical
> compounds database from MySQL to OrientDB.
> --------------------------------------------------
> ODocument - Class: Compound   id: #11:5111   v.1
> --------------------------------------------------
>       iupac_cas_name : chloro(trifluoro)methane
>          create_date : Sat Jan 17 00:00:00 CST 1970
> iupac_traditional_name : chloro(trifluoro)methane
> cactvs_hbond_acceptor : 3
>      component_count : 1
>   cactvs_tauto_count : 1
>      nonstandardbond : null
>     molecular_weight : 104.45891
>      coordinate_type : 1
> 5
> 255
>  monoisotopic_weight : 103.964066
>       iupac_inchikey : AFYPFACVUDMOHA-UHFFFAOYSA-N
>           exact_mass : 103.964066
>               xlogp3 : 2.0
>           iupac_name : chloro(trifluoro)methane
>   openeye_iso_smiles : C(F)(F)(F)Cl
> compound_canonicalized : 1
>  isotopic_atom_count : 0
>      cactvs_subskeys :
> AAADcQAAAYAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQIAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
> atom_udef_stereo_count : 0
>    cactvs_complexity : 28
> iupac_systematic_name : chloranyl-tris(fluoranyl)methane
> bond_udef_stereo_count : 0
> bond_def_stereo_count : 0
>   cactvs_hbond_donor : 0
>      bondannotations : undefined
>          cactvs_tpsa : 0
>                  cas : [75-72-9, 185009-43-2
> 75-72-9, 50815-73-1, 000075-72-9, 185009-43-2, 4-01-00-00034 (Beilstein
> Handbook Reference)]
>   openeye_can_smiles : C(F)(F)(F)Cl
>     heavy_atom_count : 5
>   iupac_openeye_name : chloro(trifluoro)methane
>          iupac_inchi : InChI=1S/CClF3/c2-1(3,4)5
>          modify_date : Sat Jan 17 00:00:00 CST 1970
>    molecular_formula : CClF3
>         total_charge : 0
>         compound_cid : 6392
> atom_def_stereo_count : 0
> cactvs_rotatable_bond : 0
>
> The embedded list field is the CAS field.
>
> The schema of Class Compound is as the attachment.
>
> On Tuesday, April 8, 2014 9:09:30 PM UTC+8, Andrey Lomakin wrote:
>
>> Could you provide database sample ?
>>
>>
>> On Tue, Apr 8, 2014 at 8:51 AM, Wise Jack <[email protected]> wrote:
>>
>>> Hi, Andrey.
>>>
>>> Thanks for your reply. The memory information is as below:
>>>
>>> [root@root ~]# cat /proc/meminfo
>>> MemTotal:        8063160 kB
>>> MemFree:          228968 kB
>>>
>>> As you can see
>>>
>>>      "involvedIndexes":["ClassA.fieldA"],
>>>      "current":"#11:960477",
>>>      "fetchingFromTargetElapsed":160596,
>>>      "documentReads":959211,
>>>
>>> Even the database can see the index, but it still iterate all the
>>> documents in the database, I think that's the reason for the slow.
>>>
>>> The same data in mysql(that using fieldA's index), can return data in
>>> 0.015second, so I think this is not the fault of the data, maybe there is a
>>> better way for creating index or querying using index for embedded list of
>>> OrientDB.
>>>
>>> On Monday, April 7, 2014 5:25:27 PM UTC+8, Andrey Lomakin wrote:
>>>
>>>> Yes too slow.
>>>> What amount of RAM do you have ?-
>>>>
>>>>
>>>> On Mon, Apr 7, 2014 at 5:33 AM, Wise Jack <[email protected]> wrote:
>>>>
>>>>> I'm testing orientdb for a storage database of a knowledge base.
>>>>>
>>>>> The database can be something like this:
>>>>>
>>>>> [
>>>>>     {
>>>>>         fieldA: ['a','b','c']
>>>>>     },
>>>>>     {
>>>>>         fieldA: ['c','d','e']
>>>>>     },
>>>>> ]
>>>>>
>>>>>
>>>>> and the query is something like this:
>>>>>
>>>>> select from ClassA where 'c' in fieldA
>>>>>
>>>>>
>>>>> The query is very very slow, the explain of the query is as below
>>>>>
>>>>> {
>>>>>     "@type":"d","@version":0,
>>>>>      "involvedIndexes":["ClassA.fieldA"],
>>>>>      "current":"#11:960477",
>>>>>      "fetchingFromTargetElapsed":160596,
>>>>>      "documentReads":959211,
>>>>>      "documentAnalyzedCompatibleClass":959211,
>>>>>      "recordReads":959211,
>>>>>      "elapsed":160596.25,
>>>>>      "resultType":"collection",
>>>>>      "resultSize":1,
>>>>>      
>>>>> "@fieldTypes":"involvedIndexes=e,fetchingFromTargetElapsed=l,documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"
>>>>>  }
>>>>>
>>>>> As you can see, even OrientDB used the fieldA index, it still costs 16
>>>>> seconds to query a million records, it is unacceptable.
>>>>>
>>>>> Is there any good way to make this query faster?
>>>>>
>>>>> https://stackoverflow.com/questions/22896528/embedded-list-
>>>>> query-performance-in-orientdb
>>>>>
>>>>> --
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "OrientDB" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey Lomakin.
>>>>
>>>> Orient Technologies
>>>> the Company behind OrientDB
>>>>
>>>>   --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey Lomakin.
>>
>> Orient Technologies
>> the Company behind OrientDB
>>
>>   --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Embedded List Query Performance In OrientDB

Reply via email to