Hi Greg,

Thanks for the examples. I will give it a try.

JW

___________________
JW Feng, Ph.D.
Denali Therapeutics Inc.
201 Gateway Blvd. South San Francisco, CA 94080 | (650) 270-0628

On Thu, Oct 22, 2015 at 2:06 AM, Greg Landrum <[email protected]>
wrote:

> Hi JW,
>
> On Thu, Oct 22, 2015 at 12:47 AM, JW Feng <[email protected]> wrote:
>
>>
>> I read a post (link below) about SD tag reordering by Matthew and replied
>> by Greg and I have a follow up question. I would like to preserve the
>> ordering of SD tags as they appear in the input SD file. I tried getting
>> the list of SD tags by mol.GetPropNames() and setting the order with
>> sd_writer.SetProps() but that didn't work. Turns out mol.GetPropNames()
>> returns a list in alphabetical order instead of order of appearance.
>>
>
> I would say instead that they appear in an unspecified, implementation
> dependant, order. This may be alphabetic, but it's certainly not guaranteed
> to be so.
>
>
>> Is there a way to preserve SD tag orders?
>>
>
> There is currently no way to do this automatically. I have always thought
> about those properties as being unordered, so the RDKit doesn't maintain
> any record of what order properties are added to a molecule.
>
> As long as you have the original SDMolSupplier, you can pretty easily get
> the ordered list of property names from that:
>
> In [22]: suppl = Chem.SDMolSupplier('tmp.sdf')
>
> In [23]: m = suppl[0]
>
> In [25]: list(m.GetPropNames())   # <- here's the non-ordered list
> Out[25]:
> ['PUBCHEM_ATOM_DEF_STEREO_COUNT',
>  'PUBCHEM_ATOM_UDEF_STEREO_COUNT',
>  'PUBCHEM_BONDANNOTATIONS',
>  'PUBCHEM_BOND_DEF_STEREO_COUNT',
>  'PUBCHEM_BOND_UDEF_STEREO_COUNT',
>  'PUBCHEM_CACTVS_COMPLEXITY',
>  'PUBCHEM_CACTVS_HBOND_ACCEPTOR',
>  'PUBCHEM_CACTVS_HBOND_DONOR',
>  'PUBCHEM_CACTVS_ROTATABLE_BOND',
>  'PUBCHEM_CACTVS_SUBSKEYS',
>  'PUBCHEM_CACTVS_TAUTO_COUNT',
>  'PUBCHEM_CACTVS_TPSA',
>  'PUBCHEM_COMPONENT_COUNT',
>  'PUBCHEM_COMPOUND_CANONICALIZED',
>  'PUBCHEM_COMPOUND_CID',
>  'PUBCHEM_COORDINATE_TYPE',
>  'PUBCHEM_EXACT_MASS',
>  'PUBCHEM_HEAVY_ATOM_COUNT',
>  'PUBCHEM_ISOTOPIC_ATOM_COUNT',
>  'PUBCHEM_IUPAC_CAS_NAME',
>  'PUBCHEM_IUPAC_INCHI',
>  'PUBCHEM_IUPAC_INCHIKEY',
>  'PUBCHEM_IUPAC_NAME',
>  'PUBCHEM_IUPAC_OPENEYE_NAME',
>  'PUBCHEM_IUPAC_SYSTEMATIC_NAME',
>  'PUBCHEM_IUPAC_TRADITIONAL_NAME',
>  'PUBCHEM_MOLECULAR_FORMULA',
>  'PUBCHEM_MOLECULAR_WEIGHT',
>  'PUBCHEM_MONOISOTOPIC_WEIGHT',
>  'PUBCHEM_OPENEYE_CAN_SMILES',
>  'PUBCHEM_OPENEYE_ISO_SMILES',
>  'PUBCHEM_TOTAL_CHARGE',
>  'PUBCHEM_XLOGP3_AA']
>
> In [26]: txt = suppl.GetItemText(0)
>
> In [27]: pns = re.findall(r'> *<(\w+)>',txt)    # <- this gives you the
> list in order
>
> In [28]: pns
> Out[28]:
> ['PUBCHEM_COMPOUND_CID',
>  'PUBCHEM_COMPOUND_CANONICALIZED',
>  'PUBCHEM_CACTVS_COMPLEXITY',
>  'PUBCHEM_CACTVS_HBOND_ACCEPTOR',
>  'PUBCHEM_CACTVS_HBOND_DONOR',
>  'PUBCHEM_CACTVS_ROTATABLE_BOND',
>  'PUBCHEM_CACTVS_SUBSKEYS',
>  'PUBCHEM_IUPAC_OPENEYE_NAME',
>  'PUBCHEM_IUPAC_CAS_NAME',
>  'PUBCHEM_IUPAC_NAME',
>  'PUBCHEM_IUPAC_SYSTEMATIC_NAME',
>  'PUBCHEM_IUPAC_TRADITIONAL_NAME',
>  'PUBCHEM_IUPAC_INCHI',
>  'PUBCHEM_IUPAC_INCHIKEY',
>  'PUBCHEM_XLOGP3_AA',
>  'PUBCHEM_EXACT_MASS',
>  'PUBCHEM_MOLECULAR_FORMULA',
>  'PUBCHEM_MOLECULAR_WEIGHT',
>  'PUBCHEM_OPENEYE_CAN_SMILES',
>  'PUBCHEM_OPENEYE_ISO_SMILES',
>  'PUBCHEM_CACTVS_TPSA',
>  'PUBCHEM_MONOISOTOPIC_WEIGHT',
>  'PUBCHEM_TOTAL_CHARGE',
>  'PUBCHEM_HEAVY_ATOM_COUNT',
>  'PUBCHEM_ATOM_DEF_STEREO_COUNT',
>  'PUBCHEM_ATOM_UDEF_STEREO_COUNT',
>  'PUBCHEM_BOND_DEF_STEREO_COUNT',
>  'PUBCHEM_BOND_UDEF_STEREO_COUNT',
>  'PUBCHEM_ISOTOPIC_ATOM_COUNT',
>  'PUBCHEM_COMPONENT_COUNT',
>  'PUBCHEM_CACTVS_TAUTO_COUNT',
>  'PUBCHEM_COORDINATE_TYPE',
>  'PUBCHEM_BONDANNOTATIONS']
>
> If you pass that list of property names to the SDWriter's SetPropNames()
> method, it will write things out in the input order.
>
> I hope this helps,
> -greg
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to