Hey Gustavo,

Thank you very much for your script!
I need to specify that I am working with many SDF filles, each of
which consist of one 3D structure of the ligand ( I don't see any
difference here between pdb, so if I can apply it on PDB directly it
would be rather better!!)
 Anyway I've just tried to adapt you script for my case

# I simplify the function to take only 4 properties required for
lipinsky calculations,
# I also substitute Source on the name of the particular SDF file (See below)
def load_sdf_file(file, key):
"""
Reads molecules from an SDF file keeping only molecules
with valid SMILES, and assign a source field
"""
df = PandasTools.LoadSDF(file)
df['Source'] = key
df['LogP'] = df['ROMol'].apply(Chem.Descriptors.MolLogP)
df['MolWt'] = df['ROMol'].apply(Chem.Descriptors.MolWt)
df['LipinskyHBA'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA)
df['LipinskyHBD'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD)
df = df[['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD']]
return df


pwd = os.getcwd()
filles='sdf'
results='results'
#set directory to analyse
data = os.path.join(pwd,filles)
#set directory with outputs
results = os.path.join(pwd,results)

# go to the folder with all SDF filles
os.chdir(data)

# loop each SDF and use it with the function
for sdf in dirlist:
sdf_name=sdf.rsplit( ".", 1 )[ 0 ]
key = f'{sdf_name}'
df = load_sdf_file(sdf,key)
print(f'{sdf_name}.sdf has been processed')

The problem is that it always stores the last line within DF, while I
need rather to append each processed SDF file. Also I've got an error
on one of the sdf file which interrupted the script:

Traceback (most recent call last):

  File "./lipinski2.py", line 67, in <module>

    df = load_sdf_file(sdf,key)

  File "./lipinski2.py", line 26, in load_sdf_file

    df['LogP']   = df['ROMol'].apply(Chem.Descriptors.MolLogP)

  File 
"/Users/gleb/opt/miniconda3/envs/my-rdkit-env/lib/python3.7/site-packages/pandas/core/frame.py",
line 2906, in __getitem__

    indexer = self.columns.get_loc(key)

  File 
"/Users/gleb/opt/miniconda3/envs/my-rdkit-env/lib/python3.7/site-packages/pandas/core/indexes/base.py",
line 2897, in get_loc

    raise KeyError(key) from err

KeyError: 'ROMol'

Probably some additional IF statement is required to ignore the file
in the case of "broken" SDF...

вт, 1 дек. 2020 г. в 19:07, Gustavo Seabra <gustavo.sea...@gmail.com>:
>
> Hi Jeff,
>
>
>
> There's a lot f people here with way more experience than me, so this may not 
> be the optimal solution... But here is what I would do in this case:
>
>
>
> from rdkit import Chem, DataStructs
>
> from rdkit.Chem import Draw, PandasTools, Descriptors, rdMolDescriptors
>
> from IPython.display import HTML
>
>
>
> def load_sdf_file(file,source,id_column):
>
>     """
>
>     Reads molecules from an SDF file keeping only molecules
>
>     with valid SMILES, and assign a source field
>
>     """
>
>     df = PandasTools.LoadSDF(file)
>
>     df['Source'] = source
>
>     df['ID'] = df[id_column]
>
>     df['SMILES'] = df['ROMol'].apply(Chem.MolToSmiles)
>
>     df['LogP']   = df['ROMol'].apply(Chem.Descriptors.MolLogP)
>
>     df['MolWt']  = df['ROMol'].apply(Chem.Descriptors.MolWt)
>
>     df['LipinskyHBA'] = 
> df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA)
>
>     df['LipinskyHBD'] = 
> df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD)
>
>
>
>     df = 
> df[['Source','ID','SMILES','LogP','MolWt','LipinskyHBA','LipinskyHBD','ROMol']]
>
>     return df
>
>
>
> df = load_sdf_file("chembl-26_phase-1.sdf","ChEMBL_Phase-1","ID")
>
> df.head() #Should show the top of the DataFrame, with the properties and the 
> structures.
>
>
>
>
>
> All the best,
>
> --
>
> Gustavo Seabra
>
>
>
> -----Original Message-----
> From: Jeff Saxon <jmsstarli...@gmail.com>
> Sent: Tuesday, December 1, 2020 7:35 AM
> To: rdkit-discuss@lists.sourceforge.net
> Subject: [Rdkit-discuss] Applying Lipinsky filter on ligand data set
>
>
>
> Dear All,
>
>
>
> I've just started working with RDKIT focusing on the application of the 
> Lipinsky rule on the set of my ligands. Basically I take a 3D coordinates of 
> each ligand file (in SDF format) and then calculate for it required 4 
> properties Here is my code:
>
> # make a list of all .sdf filles present in data folder:
>
>     dirlist = [os.path.basename(p) for p in glob.glob('data' + '/*.sdf')]
>
>
>
>     # create empty data file with 5 columns:
>
>     # name of the file,  value of variable p, value of ac, value of don, 
> value of wt
>
>     df = pd.DataFrame(columns=["key", "p", "ac", "don", "wt"])
>
>
>
>     # for each sdf file get its name and calculate 4 different
>
> properties: p, ac, don, wt
>
> for sdf in dirlist:
>
> sdf_name=sdf.rsplit( ".", 1 )[ 0 ]
>
> key = f'{sdf_name}'
>
> mol = open(sdf,'rb')
>
> m = Chem.ForwardSDMolSupplier(mol)
>
> for conf in m:
>
> if conf is None: continue
>
> p = MolLogP(conf) # coeff conc-perm
>
> ac = CalcNumLipinskiHBA(conf)#
>
> don = CalcNumLipinskiHBD(conf)
>
> wt = MolWt(conf)
>
> #two=AllChem.Compute2DCoords(conf)
>
> Draw.MolToFile(conf,results+f'/{key}.png')
>
> #df[key] = [p, ac, don, wt]
>
>
>
> Could you suggest how can I summarize the calculation of each ligand in 
> pandas-like DF and to then apply lipinsky filter on it?
>
> Is it possible to convert 3D coordinates to 2D in order that I could draw it 
> (presently it makes a sketch based on 3d coordinates directly from SDF)?
>
>
>
>
>
> _______________________________________________
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to