>
> Dear Francesco,
> To read a SMILES file, you can use:
>
> suppl = Chem.SmilesMolSupplier('file_name.smi', delimiter='\t',
> titleLine=False)
> mols = [mol for mol in suppl if mol]
>
> If it's a .txt file, just change the .smi to .txt and the supplier should
> still work.
> As for viewing all the bits, I wonder why you need that ? If you want to
> view for the sake of getting an intuition about the calculation results
> (e.g. seeing how many bits are 1 and how many 0) you may use:
> print(len(vector[vector == 1])) to get bits with value 1 , and
> print(len(vector[vector == 0])) to get bits with value 0. Also, you can use
> list(fp.GetOnBits()) to get a list of non-zero bits. However,if you want to
> print the all the bits, you may try: print(list(vector))
> For writing the results in a file, assuming the required file type is csv
> or txt, you may use:
>
> np.savetxt("File_Name.csv", vector.reshape(1,len(vector)), delimiter=",")
>
> If you want a txt file just change .csv to .txt in the file name.
> Note that when you get a syntax error it means that the python parser
> couldn't understand the code, usually due to typos, missing or extra
> parentheses,  incorrect indentation, etc. In your case the syntax error was
> due to missing colon ':' after the 'for mol in list' statement.
> Here's a code for one way of how to read mols from txt file containing
> SMILES, calculating fps and writing them to a csv or txt files:



> # Read mols from a SMILES file and put them in a list:
> suppl =
> Chem.SmilesMolSupplier('SMILES_File.txt',delimiter='\t',titleLine=False)
> mols = [mol for mol in suppl if mol]
>
> # For each molecule, calculate the fps and append the results in a
> list(Results) for writing:
> Results = []
> for mol in mols:
>     fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024)
>     vector = np.array(fp)
>     Results.append(vector)
>
> # write the results in a .csv or .txt file:
> np.savetxt("file_name.csv", Results, delimiter=",")
>
> I hope this works for you.
> Best regards,
> Omar
>

On Thu, Mar 12, 2020 at 3:23 PM Francesco Coppola <
coppolafrancesco1...@gmail.com> wrote:

> Hello everyone,
> Before exposing my new problem, I wanted to thank everyone who helped me
> in the previous discussion. Really thank you, I never expected so much
> collaboration. I followed the advice, I started studying something on
> Python too (I started online courses). But I would like to explain what I
> would like to do and ask you if it is possible with RDkit.
>
> Basically I want to understand now how to get fingerprints from Smile
> contained in a file (.txt .smi .sdf, it is indifferent) in the form of bits
> of 1 and 0. For the moment I am able to do it with a single smile, but I
> can't get the complete sequence since the maximum bit that I can display is
> 1000. Is it possible to change it? Now I'll explain:
>
> (base) C:\Users\HP>conda activate py37_rdkit
>
> (py37_rdkit) C:\Users\HP>python
>
> Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
> :: Anaconda, Inc. on win32
>
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> import rdkit
>
> >>> from rdkit import Chem
>
> >>> from rdkit.Chem import Draw
>
> >>> from rdkit.Chem import Descriptors
>
> >>> from rdkit.Chem import AllChem
>
> >>> from rdkit import DataStructs
>
> >>> from __future__ import print_function
>
> >>>
>
> >>> import numpy as np
>
> >>> info = {}
>
> >>> mol = Chem.MolFromSmiles('CCC')
>
> >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024,
> bitInfo=info)
>
> >>> vector = np.array(fp)
>
> >>> vector
>
> array([0, 0, 0, ..., 0, 0, 0])
>
> >>>
>
> Is there a way to view all the bits? The only way I know is to lower the
> value of nBits to 1000 (which, however, I would not want to do). And in
> fact:
>
>
> >>> mol = Chem.MolFromSmiles('CCC')
>
> >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1000,
> bitInfo=info)
>
> >>> vector = np.array(fp)
>
> >>> vector
>
> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>        0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>
> >>>
>
> But like I said I would like to keep the number of nBits = 1024. But this
> is not the main problem because I would like it to be automatically written
> in a file. It's possible? And above all, is it possible to do it for a file
> that has smiles and names for each line? For example type file .txt
>
> CC                1257
> CCCC           544235
> CCCCCC      9850982
> CCCCCCC   894983
>
> To do this I guess I have to use the list function like:
>
> >>> list = [r'C:\Users\HP\Desktop\Python_ex\smile_molecules.txt'] #which
> is the location of the file.
> >>>      for mol in list              #Here give me error SyntaxError:
> invalid syntax
>
> >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024,
> bitInfo=info) #here maybe mol became list ?
>
> >>> vector = np.array(fp)
>
> >>> vector
>
> But obviously it doesn't work. I hope you can help me. I don't know if
> what I want to do is possible. If you know some similar work, I'm really
> glad to read it, and maybe I can use it as a guide.
>
> Good day and thank you very much for your availability and collaboration.
>
> Best regards,
> Francesco Coppola
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to