Hi Francesco,

np.savetxt() expects an array of numbers, while you have an array of tuples (i.e., the individual clusters), hence the error.

You don't actually need numpy to save an array in human-readable, text format. You might store your array in JSON format:

import json

with open("c:/temp/clusters.json", "w") as hnd:
    json.dump(clusters, hnd)

And than restore it as a tuple of tuples, as it originally was:

clusters = None
with open("c:/temp/clusters.json", "r") as hnd:
    clusters = tuple(map(tuple, json.load(hnd)))

You might also store the array in its string representation...

from ast import literal_eval

with open("c:/temp/clusters.txt", "w") as hnd:
    hnd.write(str(clusters) + "\n")

...and then restore it using ast.literal_eval():

clusters = None
with open("c:/temp/clusters.txt", "r") as hnd:
    clusters = literal_eval(hnd.read())

HTH, cheers,
p.

On 23/03/2020 13:29, Francesco Coppola wrote:
Hello everyone,

I have a small problem with saving a job. With the fingerprints of a database of molecules, I made the clusters. It works, I see them, but *how can I save it*?

>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> def ClusterFps(fps, cutoff=0.2):
...     from rdkit import DataStructs
...     from rdkit.ML.Cluster import Butina
...     dists=[]
...     nfps=len(fps)
...     for i in range(1, nfps):
... sims=DataStructs.BulkTanimotoSimilarity(fps[i], fps [:i])
...             dists.extend([1-x for x in sims])
...     cs=Butina.ClusterData(dists, nfps, cutoff, isDistData=True)
...     return cs
...
>>> ms = [x for x in Chem.SDMolSupplier(r'C:\Users\HP\100.sdf',removeHs=False)]
>>> len(ms)
100
>>> fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,1024) for x in ms]
>>> clusters=ClusterFps(fps,cutoff=0.4)
>>> print(clusters[1])
(13, 4, 8)
>>> print(clusters)
((17, 15, 46), (13, 4, 8), (91, 53), (78, 76), (64, 42), (59, 58), (44, 43), (39, 38), (31, 30), (25, 24), (7,), (99,), (98,), (97,), (96,), (95,), (94,), (93,), (92,), (90,), (89,), (88 ,), (87,), (86,), (85,), (84,), (83,), (82,), (81,), (80,), (79,), (77,), (75,), (74,), (73,), (72,), (71,), (70,), (69,), (68,), (67,), (66,), (65,), (63,), (62,), (61,), (60,), (57,), (56,), (55,), (54,), (52,), (51,), (50,), (49,), (48,), (47,), (45,), (41,), (40,), (37,), (36,), (35,), (34,), (33,), (32,), (29,), (28,), (27,), (26,), (23,), (22,), (21,), (20,), (19, ), (18,), (16,), (14,), (12,), (11,), (10,), (9,), (6,), (5,), (3,), (2,), (1,), (0,))

*If I try to use:*
*
*
>>> np.savetxt("DB_Clusters", clusters, delimiter="     ")
Traceback (most recent call last):
  File "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line 1447, in savetxt
    v = format % tuple(row) + newline
TypeError: must be real number, not tuple

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 6, in savetxt
  File "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line 1451, in savetxt
    % (str(X.dtype), format))
TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e')

*Then I thought I hadn't imported Numpy, but the problem was not resolved.*
>>> import numpy as np
>>> np.savetxt("DB_Clu.txt", clusters, delimiter="      ")
Traceback (most recent call last):
  File "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line 1447, in savetxt
    v = format % tuple(row) + newline
TypeError: must be real number, not tuple

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 6, in savetxt
  File "C:\Anaconda3\envs\py37_rdkit\lib\site-packages\numpy\lib\npyio.py", line 1451, in savetxt
    % (str(X.dtype), format))
TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e')

*The problem is that I can't use this function to save clusters?
How can I save the results with the clusters?*
*
*
Sorry for the trouble,

Best regards,
Francesco


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to