[Rdkit-discuss] ANN: chemfp 4.1

2023-05-17 Thread Andrew Dalke
Hi everyone,

 I've just released chemfp 4.1. To install the pre-compiled package for 
Linux-based OSes do:

  python -m pip install chemfp -i https://chemp.com/packages/

For a detailed description of what's new, see:

  https://chemfp.readthedocs.io/en/latest/whats_new_in_41.html

As a summary, the new features in this release include:

- Supports RDKit 2023.03.1 and Python 3.8 through 3.11

- Interprets input SMILES as CXSMILES by default, with an option to turn that 
off

- Can save/load similarity search results to a NumPy file in a form compatible 
with SciPy compressed sparse matrices

- Implements Butina clustering, with several variations.

While building the similarity matrix may take an hour, the result can be saved 
to an npz file that the Butina implementation can use as input. This can be 
useful when tuning the Butina parameters because the NxN matrix can be 
constructed once, at the lowest reasonable threshold, while the Butina 
clustering can use a higher threshold. It takes only a few seconds to cluster 
ChEMBL at a threshold of 0.6.

- Sphere exclusion ("spherex") has been parallelized, with new options for 
specifying directed sphere exclusion ranking and a new output format compatible 
with the Butina output

- The new "chemfp csv2fps" tool for generating fingerprints from CSV files 
containing identifiers and molecules.

- The new "chemfp translate" tool for structure file format conversion.

These are available for no cost under the Chemfp Base License Agreement at 
https://chemfp.com/BaseLicense.txt .

For other licensing options, including no-cost license key for academic use, 
see https://chemfp.com/license/ .

Best regards,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Can a bond index be associated with order in explicit SMILES?

2023-05-17 Thread Andrew Dalke
On May 17, 2023, at 02:31, Vincent Scalfani  wrote:
> I thought that this might also be the case for bond indices, but that does 
> not appear to be correct (see example below). Is it possible to get a bond 
> index in the order of the SMILES? 

This may help you understand why that's a difficult question.

What does the bond index mean in something like

 C12.OC23.C3.C1

? Does the bond for closure 1 come first in the bond list, because that's where 
it start, or is it last, because that's where it ends? It looks like you think 
it should be the closure position.

Here's your SMILES labelled by atom index:

┌1 1 1  1  1 1  1   11 1
   atoms│ 0 1 2  3 4  5   6   7 8 9  0 1 2  3  4 5  6   78 9
└ | | |  | |  |   |   | | |  | | |  |  | |  |   || |
  SMILES[ C-C-c1:c:c:[nH]:c:1-C-C-C1-C-C-C(-c2:c:c:[nH]:c:2)-C-C-1

I used the program at the end of this email to print the information in bond 
list order:

In bondlist order
i Bnd# a1 ~ a2   frag
0   0   0 -  1   C-C
1   1   1 -  2   C-c
2   2   2 :  3   c:c
3   3   3 :  4   c:c
4   4   4 :  5  c:[nH]
5   5   5 :  6  [nH]:c
6   6   6 -  7   c-C
7   7   7 -  8   C-C
8   8   8 -  9   C-C
9   9   9 - 10   C-C
10  10  10 - 11   C-C
11  11  11 - 12   C-C
12  12  12 - 13   C-c
13  13  13 : 14   c:c
14  14  14 : 15   c:c
15  15  15 : 16  c:[nH]
16  16  16 : 17  [nH]:c
17  17  12 - 18   C-C
18  18  18 - 19   C-C
19  19   6 :  2   c:c
20  20  19 -  9   C-C
21  21  17 : 13   c:c


If you step through them you'll see that the closure atoms (2-6, 9-19, and 
13-17) are added to the bond list at the end, after processing the atoms which 
make up the spanning tree.

It appears the closure bond have the begin and end atom indices with the 
largest first, which makes it possible to tell that a given bond is a closure 
bond.

In principle then it should be possible to reorder the bonds to get the order 
you want.

This proved trickier than I could manage in the time I have.

Perhaps the better question is, why do you need the bond indices in a specific 
order?

Cheers,


Andrew
da...@dalkescientific.com


from rdkit import Chem

bond_symbols = {
   Chem.BondType.SINGLE: "-",
   Chem.BondType.DOUBLE: "=",
   Chem.BondType.TRIPLE: "#",
   Chem.BondType.AROMATIC: ":",
}

smi = "CCc1cc[nH]c1CCC1CCC(CC1)c1cc[nH]c1"
#smi = "[C@@](F)(Cl)(Br)O"
mol1 = Chem.MolFromSmiles(smi)
smi_explicit = Chem.MolToSmiles(mol1, allBondsExplicit=True)
mol2 = Chem.MolFromSmiles(smi_explicit)

def show(bonds):
   print(" i Bnd# a1 ~ a2   frag")
   for i, b in enumerate(bonds):
   a1, a2 = b.GetBeginAtomIdx(), b.GetEndAtomIdx()
   symbol = bond_symbols[b.GetBondType()]
   s = Chem.MolFragmentToSmiles(mol2, atomsToUse=[a1, a2], rootedAtAtom=a1, 
allBondsExplicit=True)
   print(f"{i:2d}  {b.GetIdx():2d}  {a1:2d} {symbol} {a2:2d} {s.center(8)}")

print(smi_explicit)
print("In bondlist order")
show(mol2.GetBonds())



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss