Hi Francesco, I wonder what format do you want to save the clusters in ? If you want to save them for later use in python (i.e. save them and load them) you may use the pickle library:
import pickle # To save the clusters with open("Clusters1.txt", "wb") as F: pickle.dump(clusters, F) # To load the clusters with open("Clusters1.txt", "rb") as F: clusters1 = pickle.load(F) If you want to save them as text, you may just convert them to string and write them to a .txt (but not directly reversible) : with open('Clusters.txt', 'w') as the_file: the_file.write(str(clusters)) If you want each cluster in a row (.csv file) you may try: import csv with open("Clusters.txt","w") as F: csv.writer(F, delimiter=",", lineterminator="\r").writerows(clusters) I hope this works for you. Best regards, Omar On Mon, Mar 23, 2020 at 4:31 PM Francesco Coppola < coppolafrancesco1...@gmail.com> wrote: > Hello everyone, > > I have a small problem with saving a job. With the fingerprints of a > database of molecules, I made the clusters. It works, I see them, but *how > can I save it*? > > >>> from rdkit import Chem > >>> from rdkit.Chem import AllChem > >>> def ClusterFps(fps, cutoff=0.2): > ... from rdkit import DataStructs > ... from rdkit.ML.Cluster import Butina > ... dists=[] > ... nfps=len(fps) > ... for i in range(1, nfps): > ... sims=DataStructs.BulkTanimotoSimilarity(fps[i], fps [:i]) > ... dists.extend([1-x for x in sims]) > ... cs=Butina.ClusterData(dists, nfps, cutoff, isDistData=True) > ... return cs > ... > >>> ms = [x for x in > Chem.SDMolSupplier(r'C:\Users\HP\100.sdf',removeHs=False)] > >>> len(ms) > 100 > >>> fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,1024) for x in ms] > >>> clusters=ClusterFps(fps,cutoff=0.4) > >>> print(clusters[1]) > (13, 4, 8) > >>> print(clusters) > ((17, 15, 46), (13, 4, 8), (91, 53), (78, 76), (64, 42), (59, 58), (44, > 43), (39, 38), (31, 30), (25, 24), (7,), (99,), (98,), (97,), (96,), (95,), > (94,), (93,), (92,), (90,), (89,), (88 > ,), (87,), (86,), (85,), (84,), (83,), (82,), (81,), (80,), (79,), (77,), > (75,), (74,), (73,), (72,), (71,), (70,), (69,), (68,), (67,), (66,), > (65,), (63,), (62,), (61,), (60,), (57,), > (56,), (55,), (54,), (52,), (51,), (50,), (49,), (48,), (47,), (45,), > (41,), (40,), (37,), (36,), (35,), (34,), (33,), (32,), (29,), (28,), > (27,), (26,), (23,), (22,), (21,), (20,), (19, > ), (18,), (16,), (14,), (12,), (11,), (10,), (9,), (6,), (5,), (3,), (2,), > (1,), (0,)) > > *If I try to use:* > > >>> np.savetxt("DB_Clusters", clusters, delimiter=" ") > Traceback (most recent call last): > File > "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line > 1447, in savetxt > v = format % tuple(row) + newline > TypeError: must be real number, not tuple > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "<__array_function__ internals>", line 6, in savetxt > File > "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line > 1451, in savetxt > % (str(X.dtype), format)) > TypeError: Mismatch between array dtype ('object') and format specifier > ('%.18e') > > *Then I thought I hadn't imported Numpy, but the problem was not resolved.* > >>> import numpy as np > >>> np.savetxt("DB_Clu.txt", clusters, delimiter=" ") > Traceback (most recent call last): > File > "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line > 1447, in savetxt > v = format % tuple(row) + newline > TypeError: must be real number, not tuple > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "<__array_function__ internals>", line 6, in savetxt > File > "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line > 1451, in savetxt > % (str(X.dtype), format)) > TypeError: Mismatch between array dtype ('object') and format specifier > ('%.18e') > > > *The problem is that I can't use this function to save clusters?How can I > save the results with the clusters?* > > Sorry for the trouble, > > Best regards, > Francesco > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss