Dear Takayuki,
I tested in a fairly recent svn checkout as well as on 2012.03.1, which is the
version I also used for this example (the Stats.py has not changed since 2009).
I ran it on a set of public serineprotease inhibitors
(http://cheminformatics.org/datasets/bohm/bohm-test.3d.sdf, chosen because
rather small and publicly available) in newly installed python notebook. The
res[1] I got are:
[[ 0.00000000e+00 0.00000000e+00 7.67554438e-02 ..., 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 0.00000000e+00 -1.60075292e-02 ..., 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 0.00000000e+00 -8.20590478e-02 ..., 0.00000000e+00
0.00000000e+00 0.00000000e+00]
...,
[ 0.00000000e+00 0.00000000e+00 -2.85344502e-16 ..., 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 0.00000000e+00 -2.12793777e-15 ..., 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 0.00000000e+00 -2.12793777e-15 ..., 0.00000000e+00
0.00000000e+00 0.00000000e+00]]
The error you posted looks a bit like the script was manually stopped (ctrl-c
for instance) or did the program terminate with error independent from any user
interaction? The PCA can take a very long time for feature vectors with such
high dimensionality like fingerprints (several minutes in the case of my very
small example dataset). How many compounds are you using for your example? In
the case that it just ran forever in your case, could you try it with a tiny
subset (e.g. 10 compounds) just to see wether it terminates.
Best,
Niko
On Jan 18, 2013, at 11:48 PM, Taka Seri <serit...@gmail.com> wrote:
> Dear Greg and Niko.
>
> Thank you for your quick repry.
> >To Greg, thanks for your recommendation.
> I tried PCA with matplotlib and it worked with no problem. Thanks.
> But Matplotlib returned view that was different from R.
>
> >To Niko.
> I using RDKit, version RDKit_2012_06_1.
> And, when I tried PCA with the code, no response was returned.
>
> KeyboardInterrupt, following message was returned.
>
> Traceback (most recent call last):
> File "mol_pca.py", line 33, in <module>
> res=Stats.PrincipalComponents(matrix)
> File "C:\RDKit_2012_06_1\rdkit\ML\Data\Stats.py", line 82, in
> PrincipalComponents
> covMat = FormCorrelationMatrix(mat)
> File "C:\RDKit_2012_06_1\rdkit\ML\Data\Stats.py", line 66, in
> FormCorrelationMatrix
> sumY = sum(y)
>
> So, what version of RDKit are you using?
> And if you don't care, could you show me some results ?
>
> Thanks.
> Takayuki
>
> 2013/1/18 Nikolas Fechner <m...@fechner.cc>
> Hi Takayuki,
> I was able to run your code snippet without any errors (with different
> example molecules of course). Could possible explain in more detail what is
> not working for you? What version of RDKit are you using (from rdkit import
> rdBase;print rdBase.rdkitVersion) ?
>
> Niko
>
> On Jan 17, 2013, at 11:08 AM, Taka Seri <serit...@gmail.com> wrote:
>
>> Dear All.
>>
>> I want to do PCA with molecular fingerprint .
>> So, I wrote following code.
>> But, this code did not work .
>> Does anyone have a suggestion?
>> Thanks.
>>
>> Takayuki
>>
>>
>> 1 from rdkit import Chem
>> 2 from rdkit.Chem import AllChem
>> 3 from rdkit.ML.Data import Stats
>> 4 import numpy
>> 5 import sys
>> 6
>> 7
>> 8 mols = [mol for mol in Chem.SDMolSupplier(sys.argv[1])]
>> 9 fps = [AllChem.GetMorganFingerprintAsBitVect(mol,2) for mol in mols]
>> 10
>> 11 mat = []
>> 12 for fp in fps:
>> 13 bits = fp.ToBitString()
>> 14 bitsvec = [int(bit) for bit in bits]
>> 15 mat.append(bitsvec)
>> 16
>> 17 mat=numpy.array(mat)
>> 18 res = Stats.PrincpalComponents(mat)
>> 19 print res[1]
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_122712_______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------------------------------------------------------
> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
> much more. Get web development skills now with LearnDevNow -
> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
> SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122812_______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss