Thanks for the explanation, Greg.

Dave

On Thu, Sep 25, 2014 at 10:03 AM, Greg Landrum <[email protected]>
wrote:

> Hi Dave,
>
> On Thu, Sep 25, 2014 at 9:53 AM, Dave Wood <[email protected]> wrote:
>
>> Hi All,
>>
>> I have been generating explicit bit strings of RDKit fingerprints and was
>> surprised by this result. Is this the expected behaviour? From the
>> documentation it looks like the default length should be 2048.
>>
>> Interestingly if I call the RDKFingerprint method directly the lengths
>> are all the expected 2048.
>>
>
> I wouldn't really recommend using the functionality in the FingerprintMols
> module. It probably should be deprecated. It is much safer (and not much
> more difficult) to call the fingerprint functions directly.
>
> Here's what's going on: The function
> FingerprintMols.FingerprintsFromMols() uses
> FingerprintMols.FingerprintMol() internally. This sets fingerprint options
> using the class FingerprintMols.FingerprinterDetails, which includes a
> default value for the tgtDensity argument of 0.3. Setting tgtDensity with
> the RDKit fingerprint can results in the fingerprinter folding the
> fingerprint to achieve a particular target density of on bits:
>
> In [6]: m = Chem.MolFromSmiles("CCCCNC(=O)[C@@H]1CCCN(C(=O)CCC(C)C)C1")
>
> In [7]: Chem.RDKFingerprint(m).GetNumBits()
> Out[7]: 2048
>
> In [8]: Chem.RDKFingerprint(m,tgtDensity=0.3).GetNumBits()
> Out[8]: 512
>
> This explains the differences you are seeing.
>
> -greg
>
>
>
>> In [1]: smi1 = "CCCCNC(=O)[C@@H]1CCCN(C(=O)CCC(C)C)C1"
>>
>> In [2]: smi2 = "COC(=O)c1cccc(CN2C(=O)N[C@@](C)(c3ccc4c(c3)OCCO4)C2=O)c1"
>>
>> In [3]: smi3 = "CN(C)[C@@H](Cc1ccccc1)C(=O)NNC(=O)c1ccccc1O"
>>
>> In [4]: from rdkit import Chem
>>
>> In [5]: mols = [("mol1", Chem.MolFromSmiles(smi1)), ("mol2",
>> Chem.MolFromSmiles(smi2)), ("mol3", Chem.MolFromSmiles(smi3))]
>>
>> In [6]: from rdkit.Chem.Fingerprints import FingerprintMols
>>
>>
>> *In [7]: print [ len(fp[1].ToBitString()) for fp in
>> FingerprintMols.FingerprintsFromMols(mols) ][512, 2048, 1024]*
>>
>> In [8]: from rdkit.Chem.rdmolops import RDKFingerprint
>>
>> In [9]: print [ len(RDKFingerprint(mol[1]).ToBitString()) for mol in mols
>> ]
>> [2048, 2048, 2048]
>>
>> I can use the RDKFingerprint method as a solution, but I thought it was
>> worth mentioning.
>>
>> Dave
>>
>>
>> ------------------------------------------------------------------------------
>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to