Hey Gustavo, Thank you very much for your script! I need to specify that I am working with many SDF filles, each of which consist of one 3D structure of the ligand ( I don't see any difference here between pdb, so if I can apply it on PDB directly it would be rather better!!) Anyway I've just tried to adapt you script for my case
# I simplify the function to take only 4 properties required for lipinsky calculations, # I also substitute Source on the name of the particular SDF file (See below) def load_sdf_file(file, key): """ Reads molecules from an SDF file keeping only molecules with valid SMILES, and assign a source field """ df = PandasTools.LoadSDF(file) df['Source'] = key df['LogP'] = df['ROMol'].apply(Chem.Descriptors.MolLogP) df['MolWt'] = df['ROMol'].apply(Chem.Descriptors.MolWt) df['LipinskyHBA'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA) df['LipinskyHBD'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD) df = df[['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD']] return df pwd = os.getcwd() filles='sdf' results='results' #set directory to analyse data = os.path.join(pwd,filles) #set directory with outputs results = os.path.join(pwd,results) # go to the folder with all SDF filles os.chdir(data) # loop each SDF and use it with the function for sdf in dirlist: sdf_name=sdf.rsplit( ".", 1 )[ 0 ] key = f'{sdf_name}' df = load_sdf_file(sdf,key) print(f'{sdf_name}.sdf has been processed') The problem is that it always stores the last line within DF, while I need rather to append each processed SDF file. Also I've got an error on one of the sdf file which interrupted the script: Traceback (most recent call last): File "./lipinski2.py", line 67, in <module> df = load_sdf_file(sdf,key) File "./lipinski2.py", line 26, in load_sdf_file df['LogP'] = df['ROMol'].apply(Chem.Descriptors.MolLogP) File "/Users/gleb/opt/miniconda3/envs/my-rdkit-env/lib/python3.7/site-packages/pandas/core/frame.py", line 2906, in __getitem__ indexer = self.columns.get_loc(key) File "/Users/gleb/opt/miniconda3/envs/my-rdkit-env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 'ROMol' Probably some additional IF statement is required to ignore the file in the case of "broken" SDF... вт, 1 дек. 2020 г. в 19:07, Gustavo Seabra <gustavo.sea...@gmail.com>: > > Hi Jeff, > > > > There's a lot f people here with way more experience than me, so this may not > be the optimal solution... But here is what I would do in this case: > > > > from rdkit import Chem, DataStructs > > from rdkit.Chem import Draw, PandasTools, Descriptors, rdMolDescriptors > > from IPython.display import HTML > > > > def load_sdf_file(file,source,id_column): > > """ > > Reads molecules from an SDF file keeping only molecules > > with valid SMILES, and assign a source field > > """ > > df = PandasTools.LoadSDF(file) > > df['Source'] = source > > df['ID'] = df[id_column] > > df['SMILES'] = df['ROMol'].apply(Chem.MolToSmiles) > > df['LogP'] = df['ROMol'].apply(Chem.Descriptors.MolLogP) > > df['MolWt'] = df['ROMol'].apply(Chem.Descriptors.MolWt) > > df['LipinskyHBA'] = > df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA) > > df['LipinskyHBD'] = > df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD) > > > > df = > df[['Source','ID','SMILES','LogP','MolWt','LipinskyHBA','LipinskyHBD','ROMol']] > > return df > > > > df = load_sdf_file("chembl-26_phase-1.sdf","ChEMBL_Phase-1","ID") > > df.head() #Should show the top of the DataFrame, with the properties and the > structures. > > > > > > All the best, > > -- > > Gustavo Seabra > > > > -----Original Message----- > From: Jeff Saxon <jmsstarli...@gmail.com> > Sent: Tuesday, December 1, 2020 7:35 AM > To: rdkit-discuss@lists.sourceforge.net > Subject: [Rdkit-discuss] Applying Lipinsky filter on ligand data set > > > > Dear All, > > > > I've just started working with RDKIT focusing on the application of the > Lipinsky rule on the set of my ligands. Basically I take a 3D coordinates of > each ligand file (in SDF format) and then calculate for it required 4 > properties Here is my code: > > # make a list of all .sdf filles present in data folder: > > dirlist = [os.path.basename(p) for p in glob.glob('data' + '/*.sdf')] > > > > # create empty data file with 5 columns: > > # name of the file, value of variable p, value of ac, value of don, > value of wt > > df = pd.DataFrame(columns=["key", "p", "ac", "don", "wt"]) > > > > # for each sdf file get its name and calculate 4 different > > properties: p, ac, don, wt > > for sdf in dirlist: > > sdf_name=sdf.rsplit( ".", 1 )[ 0 ] > > key = f'{sdf_name}' > > mol = open(sdf,'rb') > > m = Chem.ForwardSDMolSupplier(mol) > > for conf in m: > > if conf is None: continue > > p = MolLogP(conf) # coeff conc-perm > > ac = CalcNumLipinskiHBA(conf)# > > don = CalcNumLipinskiHBD(conf) > > wt = MolWt(conf) > > #two=AllChem.Compute2DCoords(conf) > > Draw.MolToFile(conf,results+f'/{key}.png') > > #df[key] = [p, ac, don, wt] > > > > Could you suggest how can I summarize the calculation of each ligand in > pandas-like DF and to then apply lipinsky filter on it? > > Is it possible to convert 3D coordinates to 2D in order that I could draw it > (presently it makes a sketch based on 3d coordinates directly from SDF)? > > > > > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss