Hi Francesco,

I wonder what format do you want to save the clusters in ?
If you want to save them for later use in python (i.e. save them and load
them) you may use the pickle library:

import pickle
# To save the clusters
with open("Clusters1.txt", "wb") as F: pickle.dump(clusters, F)
# To load the clusters
with open("Clusters1.txt", "rb") as F: clusters1 = pickle.load(F)

If you want to save them as text, you may just convert them to string and
write them to a .txt (but not directly reversible) :

with open('Clusters.txt', 'w') as the_file: the_file.write(str(clusters))

If you want each cluster in a row (.csv file) you may try:

import csv
with open("Clusters.txt","w") as F: csv.writer(F, delimiter=",",
lineterminator="\r").writerows(clusters)

I hope this works for you.
Best regards,
Omar

On Mon, Mar 23, 2020 at 4:31 PM Francesco Coppola <
coppolafrancesco1...@gmail.com> wrote:

> Hello everyone,
>
> I have a small problem with saving a job. With the fingerprints of a
> database of molecules, I made the clusters. It works, I see them, but *how
> can I save it*?
>
> >>> from rdkit import Chem
> >>> from rdkit.Chem import AllChem
> >>> def ClusterFps(fps, cutoff=0.2):
> ...     from rdkit import DataStructs
> ...     from rdkit.ML.Cluster import Butina
> ...     dists=[]
> ...     nfps=len(fps)
> ...     for i in range(1, nfps):
> ...             sims=DataStructs.BulkTanimotoSimilarity(fps[i], fps [:i])
> ...             dists.extend([1-x for x in sims])
> ...     cs=Butina.ClusterData(dists, nfps, cutoff, isDistData=True)
> ...     return cs
> ...
> >>> ms = [x for x in
> Chem.SDMolSupplier(r'C:\Users\HP\100.sdf',removeHs=False)]
> >>> len(ms)
> 100
> >>> fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,1024) for x in ms]
> >>> clusters=ClusterFps(fps,cutoff=0.4)
> >>> print(clusters[1])
> (13, 4, 8)
> >>> print(clusters)
> ((17, 15, 46), (13, 4, 8), (91, 53), (78, 76), (64, 42), (59, 58), (44,
> 43), (39, 38), (31, 30), (25, 24), (7,), (99,), (98,), (97,), (96,), (95,),
> (94,), (93,), (92,), (90,), (89,), (88
> ,), (87,), (86,), (85,), (84,), (83,), (82,), (81,), (80,), (79,), (77,),
> (75,), (74,), (73,), (72,), (71,), (70,), (69,), (68,), (67,), (66,),
> (65,), (63,), (62,), (61,), (60,), (57,),
> (56,), (55,), (54,), (52,), (51,), (50,), (49,), (48,), (47,), (45,),
> (41,), (40,), (37,), (36,), (35,), (34,), (33,), (32,), (29,), (28,),
> (27,), (26,), (23,), (22,), (21,), (20,), (19,
> ), (18,), (16,), (14,), (12,), (11,), (10,), (9,), (6,), (5,), (3,), (2,),
> (1,), (0,))
>
> *If I try to use:*
>
> >>> np.savetxt("DB_Clusters", clusters, delimiter="     ")
> Traceback (most recent call last):
>   File
> "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line
> 1447, in savetxt
>     v = format % tuple(row) + newline
> TypeError: must be real number, not tuple
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "<__array_function__ internals>", line 6, in savetxt
>   File
> "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line
> 1451, in savetxt
>     % (str(X.dtype), format))
> TypeError: Mismatch between array dtype ('object') and format specifier
> ('%.18e')
>
> *Then I thought I hadn't imported Numpy, but the problem was not resolved.*
> >>> import numpy as np
> >>> np.savetxt("DB_Clu.txt", clusters, delimiter="      ")
> Traceback (most recent call last):
>   File
> "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line
> 1447, in savetxt
>     v = format % tuple(row) + newline
> TypeError: must be real number, not tuple
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "<__array_function__ internals>", line 6, in savetxt
>   File
> "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line
> 1451, in savetxt
>     % (str(X.dtype), format))
> TypeError: Mismatch between array dtype ('object') and format specifier
> ('%.18e')
>
>
> *The problem is that I can't use this function to save clusters?How can I
> save the results with the clusters?*
>
> Sorry for the trouble,
>
> Best regards,
> Francesco
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to