Hi, Andrey:
Sure. I'll send you a sample document of the database, I can't send the
whole database to you since it's too large:
This is a sample record of the database, I'm immigrating a chemical
compounds database from MySQL to OrientDB.
--------------------------------------------------
ODocument - Class: Compound id: #11:5111 v.1
--------------------------------------------------
iupac_cas_name : chloro(trifluoro)methane
create_date : Sat Jan 17 00:00:00 CST 1970
iupac_traditional_name : chloro(trifluoro)methane
cactvs_hbond_acceptor : 3
component_count : 1
cactvs_tauto_count : 1
nonstandardbond : null
molecular_weight : 104.45891
coordinate_type : 1
5
255
monoisotopic_weight : 103.964066
iupac_inchikey : AFYPFACVUDMOHA-UHFFFAOYSA-N
exact_mass : 103.964066
xlogp3 : 2.0
iupac_name : chloro(trifluoro)methane
openeye_iso_smiles : C(F)(F)(F)Cl
compound_canonicalized : 1
isotopic_atom_count : 0
cactvs_subskeys :
AAADcQAAAYAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQIAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
atom_udef_stereo_count : 0
cactvs_complexity : 28
iupac_systematic_name : chloranyl-tris(fluoranyl)methane
bond_udef_stereo_count : 0
bond_def_stereo_count : 0
cactvs_hbond_donor : 0
bondannotations : undefined
cactvs_tpsa : 0
cas : [75-72-9, 185009-43-2
75-72-9, 50815-73-1, 000075-72-9, 185009-43-2, 4-01-00-00034 (Beilstein
Handbook Reference)]
openeye_can_smiles : C(F)(F)(F)Cl
heavy_atom_count : 5
iupac_openeye_name : chloro(trifluoro)methane
iupac_inchi : InChI=1S/CClF3/c2-1(3,4)5
modify_date : Sat Jan 17 00:00:00 CST 1970
molecular_formula : CClF3
total_charge : 0
compound_cid : 6392
atom_def_stereo_count : 0
cactvs_rotatable_bond : 0
The embedded list field is the CAS field.
The schema of Class Compound is as the attachment.
On Tuesday, April 8, 2014 9:09:30 PM UTC+8, Andrey Lomakin wrote:
>
> Could you provide database sample ?
>
>
> On Tue, Apr 8, 2014 at 8:51 AM, Wise Jack <[email protected]<javascript:>
> > wrote:
>
>> Hi, Andrey.
>>
>> Thanks for your reply. The memory information is as below:
>>
>> [root@root ~]# cat /proc/meminfo
>> MemTotal: 8063160 kB
>> MemFree: 228968 kB
>>
>> As you can see
>>
>> "involvedIndexes":["ClassA.fieldA"],
>> "current":"#11:960477",
>> "fetchingFromTargetElapsed":160596,
>> "documentReads":959211,
>>
>> Even the database can see the index, but it still iterate all the
>> documents in the database, I think that's the reason for the slow.
>>
>> The same data in mysql(that using fieldA's index), can return data in
>> 0.015second, so I think this is not the fault of the data, maybe there is a
>> better way for creating index or querying using index for embedded list of
>> OrientDB.
>>
>> On Monday, April 7, 2014 5:25:27 PM UTC+8, Andrey Lomakin wrote:
>>
>>> Yes too slow.
>>> What amount of RAM do you have ?-
>>>
>>>
>>> On Mon, Apr 7, 2014 at 5:33 AM, Wise Jack <[email protected]> wrote:
>>>
>>>> I'm testing orientdb for a storage database of a knowledge base.
>>>>
>>>> The database can be something like this:
>>>>
>>>> [
>>>> {
>>>> fieldA: ['a','b','c']
>>>> },
>>>> {
>>>> fieldA: ['c','d','e']
>>>> },
>>>> ]
>>>>
>>>>
>>>> and the query is something like this:
>>>>
>>>> select from ClassA where 'c' in fieldA
>>>>
>>>>
>>>> The query is very very slow, the explain of the query is as below
>>>>
>>>> {
>>>> "@type":"d","@version":0,
>>>> "involvedIndexes":["ClassA.fieldA"],
>>>> "current":"#11:960477",
>>>> "fetchingFromTargetElapsed":160596,
>>>> "documentReads":959211,
>>>> "documentAnalyzedCompatibleClass":959211,
>>>> "recordReads":959211,
>>>> "elapsed":160596.25,
>>>> "resultType":"collection",
>>>> "resultSize":1,
>>>>
>>>> "@fieldTypes":"involvedIndexes=e,fetchingFromTargetElapsed=l,documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"
>>>> }
>>>>
>>>> As you can see, even OrientDB used the fieldA index, it still costs 16
>>>> seconds to query a million records, it is unacceptable.
>>>>
>>>> Is there any good way to make this query faster?
>>>>
>>>> https://stackoverflow.com/questions/22896528/embedded-
>>>> list-query-performance-in-orientdb
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey Lomakin.
>>>
>>> Orient Technologies
>>> the Company behind OrientDB
>>>
>>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Best regards,
> Andrey Lomakin.
>
> Orient Technologies
> the Company behind OrientDB
>
>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
OrientDB console v.1.7-rc2 (build UNKNOWN@r; 2014-03-25 15:54:54+0100)
www.orientechnologies.com
Type 'help' to display all the commands supported.
Installing extensions for GREMLIN language v.2.5.0-SNAPSHOT
orientdb> connect remote:localhost/compounds admin admin
Connecting to database [remote:localhost/compounds] with user 'admin'...OK
orientdb {compounds}> desc Compound
Class................: Compound
Super class..........: V
Default cluster......: compound (id=11)
Supported cluster ids: [11]
Properties:
-------------------------------+-------------+-------------------------------+-----------+----------+----------+-----------+-----------+----------+
NAME | TYPE | LINKED TYPE/CLASS |
MANDATORY | READONLY | NOT NULL | MIN | MAX | COLLATE |
-------------------------------+-------------+-------------------------------+-----------+----------+----------+-----------+-----------+----------+
iupac_inchikey | STRING | null |
false | false | false | | | default |
cactvs_hbond_donor | INTEGER | null |
false | false | false | | | default |
openeye_iso_smiles | STRING | null |
false | false | false | | | default |
total_charge | INTEGER | null |
false | false | false | | | default |
bond_def_stereo_count | INTEGER | null |
false | false | false | | | default |
cactvs_tauto_count | INTEGER | null |
false | false | false | | | default |
cactvs_complexity | INTEGER | null |
false | false | false | | | default |
iupac_openeye_name | STRING | null |
false | false | false | | | default |
bond_udef_stereo_count | INTEGER | null |
false | false | false | | | default |
atom_def_stereo_count | INTEGER | null |
false | false | false | | | default |
atom_udef_stereo_count | INTEGER | null |
false | false | false | | | default |
nonstandardbond | STRING | null |
false | false | false | | | default |
compound_cid | LONG | null |
false | false | false | | | default |
heavy_atom_count | INTEGER | null |
false | false | false | | | default |
coordinate_type | STRING | null |
false | false | false | | | default |
cactvs_subskeys | STRING | null |
false | false | false | | | default |
molecular_weight | DOUBLE | null |
false | false | false | | | default |
component_count | INTEGER | null |
false | false | false | | | default |
modify_date | DATE | null |
false | false | false | | | default |
cactvs_rotatable_bond | INTEGER | null |
false | false | false | | | default |
molecular_formula | STRING | null |
false | false | false | | | default |
bondannotations | STRING | null |
false | false | false | | | default |
exact_mass | DOUBLE | null |
false | false | false | | | default |
isotopic_atom_count | INTEGER | null |
false | false | false | | | default |
cactvs_tpsa | INTEGER | null |
false | false | false | | | default |
iupac_cas_name | STRING | null |
false | false | false | | | default |
cactvs_hbond_acceptor | INTEGER | null |
false | false | false | | | default |
monoisotopic_weight | DOUBLE | null |
false | false | false | | | default |
openeye_can_smiles | STRING | null |
false | false | false | | | default |
iupac_traditional_name | STRING | null |
false | false | false | | | default |
cas | EMBEDDEDLIST| STRING |
false | false | false | | | default |
compound_canonoicalized | LONG | null |
false | false | false | | | default |
iupac_name | STRING | null |
false | false | false | | | default |
iupac_inchi | STRING | null |
false | false | false | | | default |
create_date | DATE | null |
false | false | false | | | default |
iupac_systematic_name | STRING | null |
false | false | false | | | default |
xlogp3 | DOUBLE | null |
false | false | false | | | default |
-------------------------------+-------------+-------------------------------+-----------+----------+----------+-----------+-----------+----------+
Indexes (4 altogether):
-------------------------------+----------------+
NAME | PROPERTIES |
-------------------------------+----------------+
Compound.iupac_traditional_name| iupac_traditional_name|
Compound.iupac_name | iupac_name |
Compound.cas | cas |
Compound.compound_cid | compound_cid |
-------------------------------+----------------+