On Jan 11, 2018, at 12:04, Wandré <wandrevel...@gmail.com> wrote:
> Thanks for the link. It is very interesting. I will read very carefully.
> So, as input on ChemFP, I have to put a file with all molecules in 1 SDF?

Chemfp works with fingerprint files, in your case, chemfp's text-based "FPS" 
format. You'll need to use 'rdkit2fps' to convert your InChI structures into a 
fingerprint.

Here's an example file, where I follow the Open Babel convention of allowing an 
identifier after the InChI string:

% cat examples.inchi
InChI=1S/C6H6O/c7-6-4-2-1-3-5-6/h1-5,7H phenol
InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H benzene
InChI=1S/CH4/h1H4/i1D4 deuterated methane

You could also use an SDF or SMILES file.

Next, I generate AtomPair fingerprints. The output goes to "examples.fps", 
which I'll then display.

% rdkit2fps --pairs examples.inchi -o examples.fps
% cat examples.fps
#FPS1
#num_bits=2048
#type=RDKit-AtomPair/2 fpSize=2048 minLength=1 maxLength=30
#software=RDKit/2016.09.3 chemfp/3.1
#source=examples.inchi
#date=2018-01-11T14:38:57
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001000000000000000000000000000000000000310000000003000000000000000000000000000000000000000000007003000000000000000000000300000000000000000000000000000000000000073000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        phenol
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000300000000000000000000000000000000000000000000000000000007000000000000000000000000000000000000000000000000000000000000000070000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        benzene
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000000000000000000000000000000000000000000000070000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        deuterated methane


Finally, I run the clustering program, with a low threshold so it does 
something other than the trivial output of three clusters.

% python taylor_butina.py -t 0.3 examples.fps
0 true singletons
=>

1 false singletons
=> deuterated methane

1 clusters
phenol has 1 other members
=> benzene

This output format is rather ad hoc. I need to figure out what format people 
want from a clustering tool; preferably one that other tools can import without 
further conversion.

I'll be glad to hear any suggestions.

Cheers,


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to