> > Dear Francesco, > To read a SMILES file, you can use: > > suppl = Chem.SmilesMolSupplier('file_name.smi', delimiter='\t', > titleLine=False) > mols = [mol for mol in suppl if mol] > > If it's a .txt file, just change the .smi to .txt and the supplier should > still work. > As for viewing all the bits, I wonder why you need that ? If you want to > view for the sake of getting an intuition about the calculation results > (e.g. seeing how many bits are 1 and how many 0) you may use: > print(len(vector[vector == 1])) to get bits with value 1 , and > print(len(vector[vector == 0])) to get bits with value 0. Also, you can use > list(fp.GetOnBits()) to get a list of non-zero bits. However,if you want to > print the all the bits, you may try: print(list(vector)) > For writing the results in a file, assuming the required file type is csv > or txt, you may use: > > np.savetxt("File_Name.csv", vector.reshape(1,len(vector)), delimiter=",") > > If you want a txt file just change .csv to .txt in the file name. > Note that when you get a syntax error it means that the python parser > couldn't understand the code, usually due to typos, missing or extra > parentheses, incorrect indentation, etc. In your case the syntax error was > due to missing colon ':' after the 'for mol in list' statement. > Here's a code for one way of how to read mols from txt file containing > SMILES, calculating fps and writing them to a csv or txt files:
> # Read mols from a SMILES file and put them in a list: > suppl = > Chem.SmilesMolSupplier('SMILES_File.txt',delimiter='\t',titleLine=False) > mols = [mol for mol in suppl if mol] > > # For each molecule, calculate the fps and append the results in a > list(Results) for writing: > Results = [] > for mol in mols: > fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024) > vector = np.array(fp) > Results.append(vector) > > # write the results in a .csv or .txt file: > np.savetxt("file_name.csv", Results, delimiter=",") > > I hope this works for you. > Best regards, > Omar > On Thu, Mar 12, 2020 at 3:23 PM Francesco Coppola < coppolafrancesco1...@gmail.com> wrote: > Hello everyone, > Before exposing my new problem, I wanted to thank everyone who helped me > in the previous discussion. Really thank you, I never expected so much > collaboration. I followed the advice, I started studying something on > Python too (I started online courses). But I would like to explain what I > would like to do and ask you if it is possible with RDkit. > > Basically I want to understand now how to get fingerprints from Smile > contained in a file (.txt .smi .sdf, it is indifferent) in the form of bits > of 1 and 0. For the moment I am able to do it with a single smile, but I > can't get the complete sequence since the maximum bit that I can display is > 1000. Is it possible to change it? Now I'll explain: > > (base) C:\Users\HP>conda activate py37_rdkit > > (py37_rdkit) C:\Users\HP>python > > Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] > :: Anaconda, Inc. on win32 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import rdkit > > >>> from rdkit import Chem > > >>> from rdkit.Chem import Draw > > >>> from rdkit.Chem import Descriptors > > >>> from rdkit.Chem import AllChem > > >>> from rdkit import DataStructs > > >>> from __future__ import print_function > > >>> > > >>> import numpy as np > > >>> info = {} > > >>> mol = Chem.MolFromSmiles('CCC') > > >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024, > bitInfo=info) > > >>> vector = np.array(fp) > > >>> vector > > array([0, 0, 0, ..., 0, 0, 0]) > > >>> > > Is there a way to view all the bits? The only way I know is to lower the > value of nBits to 1000 (which, however, I would not want to do). And in > fact: > > > >>> mol = Chem.MolFromSmiles('CCC') > > >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1000, > bitInfo=info) > > >>> vector = np.array(fp) > > >>> vector > > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) > > >>> > > But like I said I would like to keep the number of nBits = 1024. But this > is not the main problem because I would like it to be automatically written > in a file. It's possible? And above all, is it possible to do it for a file > that has smiles and names for each line? For example type file .txt > > CC 1257 > CCCC 544235 > CCCCCC 9850982 > CCCCCCC 894983 > > To do this I guess I have to use the list function like: > > >>> list = [r'C:\Users\HP\Desktop\Python_ex\smile_molecules.txt'] #which > is the location of the file. > >>> for mol in list #Here give me error SyntaxError: > invalid syntax > > >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024, > bitInfo=info) #here maybe mol became list ? > > >>> vector = np.array(fp) > > >>> vector > > But obviously it doesn't work. I hope you can help me. I don't know if > what I want to do is possible. If you know some similar work, I'm really > glad to read it, and maybe I can use it as a guide. > > Good day and thank you very much for your availability and collaboration. > > Best regards, > Francesco Coppola > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss