Dear RDKitters,

To those of you who are interested in Matched Pairs analysis:

mmpdb, a library for doing large-scale MMP analysis, has been updated to
version 2.2. This update introduces three new features that enable reducing
the size of the database created.

These are:

--min-heavies-per-const-frag: During fragmentation, double- and triple cuts
where one of the constant parts has less heavy atoms than this number will
be discarded. This removes a lot of pseudo-multiple cuts where one of the
constant parts is very small, like only a halogen or a methoxy.

--smallest-transformation-only: If set on during indexing, only the
smallest transformation per pair will be indexed. If for example a
transformation is p-F-phenyl >> p-Cl-phenyl, only the F>>Cl transformation
will be indexed. Note that the phenyl environment is still encoded as part
of the environment fingerprint.

--max-radius: The maximum radius up to which environments are enumerated
can now be set on the command line during indexing.

In my tests, the using --min-heavies-per-const-frag 3 and
--smallest-transformation-only reduces the size of the database by ~70%. A
smaller max-radius can have another dramatic effect on the database size.

The latest version of mmpdb is now available for download on
https://github.com/rdkit/mmpdb. If you are interested in it, please check
it out and let me know if you find it useful or if you have problems with
it.

Best regards,
Christian

*Dr. Christian Kramer*

Computer-Aided Drug Design (CADD)


F. Hoffmann-La Roche Ltd

Pharma Research and Early Development
Bldg. 092/8.56 C

CH-4070 Basel


Phone +41 61 682 2471

mailto: christian.kra...@roche.com
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to