Hi Hitesh,

The V2000 molfile format has a feature that can be used to set a simple 
text value for an atom by adding "V  " lines to the molfile. The RDKit 
molfile *reader* supports this feature as seen below (I have seen this 
feature used to e.g. tag reactive centers in a molecule when doing RDKit 
reaction-based enumeration).

>>> from rdkit import Chem  molfile_with_values =
 >>> "".join(open("C:/temp/cns-with-values.mol").readlines()) print
 >>> molfile_with_values

   -ISIS-  07041519212D

   3  2  0  0  0  0  0  0  0  0999 V2000
     0.0958   -2.6833    0.0000 C   0  0  0  0  0  0  0  0  0  0  0 0
     0.8083   -2.2708    0.0000 N   0  0  0  0  0  0  0  0  0  0  0 0
     1.5208   -2.6792    0.0000 S   0  0  0  0  0  0  0  0  0  0  0 0
   1  2  1  0  0  0  0
   2  3  1  0  0  0  0
V    1 Carbs
V    3 Sulfuric
M  END

>>> m =  Chem.MolFromMolBlock(molfile_with_values)
 >>> m.GetAtoms()[0].GetProp('molFileValue')
'Carbs'
>>>  m.GetAtoms()[1].GetProp('molFileValue')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
KeyError: 'molFileValue'
>>>  m.GetAtoms()[2].GetProp('molFileValue')
'Sulfuric'
>>>

As you can see, the "V  " lines in the molfile are put into RDKit atom 
"molFileValue" properties.

Unfortunately, the atom values are not written when RDKit outputs a molfile:

>>> print  Chem.MolToMolBlock(m)

      RDKit          2D

   3  2  0  0  0  0  0  0  0  0999 V2000
     0.0958   -2.6833    0.0000 C   0  0  0  0  0  0  0  0  0  0  0 0
     0.8083   -2.2708    0.0000 N   0  0  0  0  0  0  0  0  0  0  0 0
     1.5208   -2.6792    0.0000 S   0  0  0  0  0  0  0  0  0  0  0 0
   1  2  1  0
   2  3  1  0
M  END

>>>

But, it is fairly easy to add them with this function:

def MolToMolBlock_WithAtomValues(mol):
     mol_block = Chem.MolToMolBlock(mol).split("\n")
     # Delete the "M  END" line.
     mol_block = mol_block[:-2]
     # Add appropriate "V" lines.
     for atom in mol.GetAtoms():
         if atom.HasProp("molFileValue"):
             mol_block.append("V  %3d %s" % (atom.GetIdx() + 1, 
atom.GetProp("molFileValue")))

     mol_block.append("M  END")
     return "\n".join(mol_block)

This lets you persist atom text values. Disclaimer: I have no idea if 
this will break in the presence of other property lines, e.g. "M CHG" 
etc., but ... it's a start.

As an example, let's first create the "CNS" molecule without atom values.

>>> m =  Chem.MolFromSmiles("CNS") print
 >>> MolToMolBlock_WithAtomValues(m)

      RDKit

   3  2  0  0  0  0  0  0  0  0999 V2000
     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0 0
     0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0 0
     0.0000    0.0000    0.0000 S   0  0  0  0  0  0  0  0  0  0  0 0
   1  2  1  0
   2  3  1  0
M  END
>>>

Add two atom values and use the new function to persist the atom values 
to the V2000 molfile output:

>>>  m.GetAtoms()[0].SetProp("molFileValue", "C-atom")
 >>> m.GetAtoms()[2].SetProp("molFileValue", "Here is an S-atom")
 >>> print MolToMolBlock_WithAtomValues(m)

      RDKit

   3  2  0  0  0  0  0  0  0  0999 V2000
     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0 0
     0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0 0
     0.0000    0.0000    0.0000 S   0  0  0  0  0  0  0  0  0  0  0 0
   1  2  1  0
   2  3  1  0
V    1 C-atom
V    3 Here is an S-atom
M  END
>>>

Check that the output can be read back in:

>>> molblock_test =  MolToMolBlock_WithAtomValues(m) m_test =
 >>> Chem.MolFromMolBlock(molblock_test)
 >>> m_test.GetAtoms()[0].GetProp("molFileValue")
'C-atom'
>>>  m_test.GetAtoms()[1].GetProp("molFileValue")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
KeyError: 'molFileValue'
>>>  m_test.GetAtoms()[2].GetProp("molFileValue")
'Here is an S-atom'
>>>

If you have multiple properties they would have to be encoded into the 
text value as e.g. key-value pairs. The text values are in principle 
limited to max. 70-80 characters (72 ?) by the MDL molfile 
specification, but RDKit probably accepts longer strings (I would guess, 
but have not tried).

A more generic solution would be to map RDKit atom and bond properties 
to molfile S-group data - but that's a bit more involved and is not 
supported at the moment.

Cheers
-- Jan Holst Jensen

On 2015-07-04 18:28, Greg Landrum wrote:
> Hi,
 >
 > On Friday, July 3, 2015, Hitesh Patel <[email protected]
 > <mailto:[email protected]>> wrote:
 >
 > Hi Greg,
 >
 > At first priority, I will use mol2 format. A s shown in mol2 format
 > explanation, we can set user specified atom attributes. I copied the
 > text below for your convenience. See the bold text.
 >
 >
 > The rdkit does not yet have a mol2 writer, so that isn't an option.
 >
 >
 > For second priority, I can use  mol files. There I have to set
 > Properties block:
 >
 > * |M ALS| - atom list and exclusive list * |M APO| - Rgroup
 > attachment point * |M CHG| - charge * .....
 >
 > But, I am not sure, whether the user defined property block is
 > allowed or not.
 >
 >
 > M CHG and M ALS are already used by the rdkit when atoms have charges
 > or there are list queries. M APO is not used, but that's because I
 > have never managed to figure out how to adapt the MDL R Group idea to
 > something sensible in the context of the rdkit.
 >
 > Which one looks feasible??
 >
 >
 > I'm still not sure of what kind of custom properties you are looking
 > to write.
 >
 > -greg
 >
 >
 >
 >
 >
 > On Fri, Jul 3, 2015 at 4:45 PM Greg Landrum <[email protected]
 > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
 >
 > Hitesh,
 >
 > It is certainly possible to set atom properties. I don't think any of
 > the output formats the rdkit can generate really support atom
 > properties though. What format did you envision writing and how would
 > the atom properties be encoded?
 >
 > -greg
 >
 >
 > On Friday, July 3, 2015, Hitesh Patel <[email protected]
 > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
 >
 > Hi Josh, Thanks for your quick reply. But, sorry, I want to set atom
 > properties, not molecule properties. Like,
 >
 > atom = m.GetAtomWithIdx(5) atom.SetProp('my_property',
 > 'value_of_my_property')
 >
 > I want to save this property associated with each atom.
 >
 > Regards,
 >
 > Hitesh Patel
 >
 > On Fri, Jul 3, 2015 at 3:41 PM, Campbell J.E. <[email protected]>
 > wrote:
 >
 > Hi Hitesh
 >
 >
 >
 > I use the PropertyMol object to save molecules with properties,
 > setting a property for a molecule is fairly simple,
 >
 >
 >
 > m.SetProp("_Name",”mol_name")
 >
 >
 >
 > for m in mol_lst:
 >
 > pm = AllChem.PropertyMol(m)
 >
 > pm.SetProp("_Name", name)
 >
 > pm.SetProp("_Energy", None)
 >
 > dump_list.append(pm)
 >
 > cPickle.dump(dump_list, open(p_name, "w+"))
 >
 >
 >
 > Then something like this will allow you to act on the molecules
 > again.
 >
 >
 >
 > mol_list = cPickle.load( open(p_name, "rb" ) )
 >
 >
 >
 > Hope this helps.
 >
 >
 >
 > Josh Campbell
 >
 >
 >
 > *From:*Hitesh Patel [mailto:[email protected]] *Sent:* 03 July
 > 2015 14:21 *To:* [email protected] *Subject:*
 > [Rdkit-discuss] Save files with new atom properties and read again
 >
 >
 >
 > Hi there,
 >
 > I am new to RDkti.
 >
 > Is there a way to save custom property for each atoms and save that
 > to any file format and use it again?
 >
 >
 > --
 >
 > Regards,
 >
 > Dr. Hitesh Patel Post-Doctoral Fellow, Technische Universität
 > Dortmund, Chemische Biologie, Otto-Hahn-Straße 6, 44227, Dortmund,
 > Germany
 >



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to