Hi, Here are the results from the shuffle (10x) test for the 5 million compounds in the eMolecules database. In general the results are good and only 33 canonicalization errors remain which should be easy to fix.
Process stops: 3429680, 3429701, 3429702, 3429717, 3429742, 3429767, 3429887, ... (these are indexes (line number) in the eMolecules-2010-03-01.smi file starting from 1) 3429680: [Li+251] 24639246 3429701: [ClH+276] 24639289 3429702: CCCC[n+251]1cccc(C)c1 24639291 ... I continued testing from 3500000. Any ideas on how to handle this? Segfaults: 1278211, 1278212 S=C1NCCCCCCNC(=S)S[Fe]2SC(=S)NCCCCCCNC(=S)S[Ni]SC(=S)NCCCCCCNC(=S)S[Fe](SC(=S)NCCCCCCNC(=S)S[Ni]S1)SC(=S)NCCCCCCNC(=S)S[Ni]SC(=S)NCCCCCCNC(=S)S2 4315482 S=C1NCCCCCCNC(=S)S[Cr]2SC(=S)NCCCCCCNC(=S)S[Ni]SC(=S)NCCCCCCNC(=S)S[Cr](SC(=S)NCCCCCCNC(=S)S[Ni]S1)SC(=S)NCCCCCCNC(=S)S[Ni]SC(=S)NCCCCCCNC(=S)S2 4315484 These have large rings which are not found I think. We should be able to correctly detect ring membership though since this is done using a spanning tree before SSSR/LSSR analysis is done. I'll take a look at this. Canonicalization errors: 33 All errors are the same problem AFAIK. The canonical code does consider the H atoms that are added when writing out the smiles. I can add this to the canonical code but I'll probably copy some code for this from the smiles format. Cc1cccc(c1)C(=O)Nc1nnc[nH]1.Cc1cccc(c1)C(=O)Nc1n[nH]cn1 8622926 Cc1cccc(c1)C(=O)Nc1n[nH]cn1.Cc1cccc(c1)C(=O)Nc1nnc[nH]1 8622926 This is not an aromaticity error, the two fragments have identical canonical code since there is no difference between n and [nH]. CC1=CC(=O)c2c(C1=O)c(O)ccc2O.CCC=C(C)C.OC.[CH].C 19231703 CC1=CC(=O)c2c(C1=O)c(O)ccc2O.CCC=C(C)C.OC.C.[CH] 19231703 C[CH] 23745856 [CH]C 23745856 C[CH2] 23745858 [CH2]C 23745858 O[O] 23903986 [O]O 23903986 C1CC[CH][CH]CCC1.C1CCCCC[CH][CH]1.C1[CH][CH]CCCCC1.C1CC[CH][CH]CCC1.[Ir]Cl.[Ir]Cl 23904497 [CH]1[CH]CCCCCC1.C1C[CH][CH]CCCC1.[CH]1CCCCCC[CH]1.[CH]1[CH]CCCCCC1.[Ir]Cl.[Ir]Cl 23904497 [CH]1[CH]CCCCCC1.C1C[CH][CH]CCCC1.C1CCC[CH][CH]CC1.[CH]1[CH]CCCCCC1.[Ir]Cl.[Ir]Cl 23904497 [CH]1CCCCCC[CH]1.C1CCC[CH][CH]CC1.[CH]1CCCCCC[CH]1.C1CCCC[CH][CH]C1.[Ir]Cl.[Ir]Cl 23904497 C[C]([CH2])[CH2].[CH2][C]([CH2])C.[Pd]Cl.[Pd]Cl 23906874 C[C]([CH2])[CH2].C[C]([CH2])[CH2].[Pd]Cl.[Pd]Cl 23906874 [CH2][C]([CH2])C.C[C]([CH2])[CH2].[Pd]Cl.[Pd]Cl 23906874 C[C]([CH2])[CH2].[CH2][C]([CH2])C.[Pd]Cl.[Pd]Cl 23906874 [CH]1CC[CH][CH]CC[CH]1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.ClCCl.[Rh] 24631596 [CH]1[CH]CC[CH][CH]CC1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.ClCCl.[Rh] 24631596 C1C[CH][CH]CC[CH][CH]1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.ClCCl.[Rh] 24631596 C1C[CH][CH]CC[CH][CH]1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.c1ccc(cc1)P(c1ccccc1)c1ccccc1.ClCCl.[Rh] 24631596 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)COc2ccccc2)cc(c1OC)OC 26965008 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)COc2ccccc2)cc(c1OC)OC 26965008 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)COc2ccccc2)cc(c1OC)OC 26965008 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)COc2ccccc2)cc(c1OC)OC 26965008 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)COc2ccccc2)cc(c1OC)OC 26965008 [CH]Oc1cc(/C=C/2\C(=O)N=c3n(C2=N)c(cs3)c2ccccc2)cc(c1OC)OC 26965122 [CH]Oc1cc(/C=C/2\C(=O)N=c3n(C2=N)c(cs3)c2ccccc2)cc(c1OC)OC 26965122 [CH]Oc1cc(/C=C/2\C(=O)N=c3n(C2=N)c(cs3)c2ccccc2)cc(c1OC)OC 26965122 [CH]Oc1cc(/C=C/2\C(=O)N=c3n(C2=N)c(cs3)c2ccccc2)cc(c1OC)OC 26965122 [CH]Oc1cc(/C=C/2\C(=O)N=c3n(C2=N)c(cs3)c2ccccc2)cc(c1OC)OC 26965122 [CH]Oc1cc(/C=C/2\C(=O)N=c3n(C2=N)c(cs3)c2ccccc2)cc(c1OC)OC 26965122 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)C(C)C)cc(c1OC)OC 26965176 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)C(C)C)cc(c1OC)OC 26965176 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)C(C)C)cc(c1OC)OC 26965176 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)C(C)C)cc(c1OC)OC 26965176 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)C(C)C)cc(c1OC)OC 26965176 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)c2ccccc2C)cc(c1OC)OC 26965734 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)c2ccccc2C)cc(c1OC)OC 26965734 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)c2ccccc2C)cc(c1OC)OC 26965734 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)c2ccccc2C)cc(c1OC)OC 26965734 [CH]Oc1cc(/C=C\2/C(=O)N=c3n(C2=N)nc(s3)c2ccccc2C)cc(c1OC)OC 26965734 *.FC1(F)Oc2c(O1)cc(c(c2)[N])N 27518948 *.FC1(F)Oc2c(O1)cc(c(c2)N)[N] 27518948 *.FC1(F)Oc2c(O1)cc(c(c2)N)[N] 27518948 CN1CCCC1c1cccnc1.OOOOOO.[CH2]C#CC 27522714 CN1CCCC1c1cccnc1.OOOOOO.CC#C[CH2] 27522714 CCCCC[CH] 29331055 [CH]CCCCC 29331055 CO[C]1[CH]C[C]([CH][CH]1)C.[C-]#[OH2+].[C-]#[OH2+].[C-]#[OH2+].[Fe] 29370482 CO[C]1[CH]C[C]([CH][CH]1)C.[C-]#[OH2+].[C-]#[OH2+].[C-]#[OH2+].[Fe] 29370482 CO[C]1[CH]C[C]([CH][CH]1)C.[C-]#[OH2+].[C-]#[OH2+].[C-]#[OH2+].[Fe] 29370482 CO[C]1[CH]C[C]([CH][CH]1)C.[C-]#[OH2+].[C-]#[OH2+].[C-]#[OH2+].[Fe] 29370482 [CH]1C[CH][CH][CH][CH][CH]1.[OH2+]#[C-].[OH2+]#[C-].[OH2+]#[C-].[Cr] 29371034 [CH]1[CH]C[CH][CH][CH][CH]1.[OH2+]#[C-].[OH2+]#[C-].[OH2+]#[C-].[Cr] 29371034 C1[CH][CH][CH][CH][CH][CH]1.[OH2+]#[C-].[OH2+]#[C-].[OH2+]#[C-].[Cr] 29371034 [CH]1[CH][CH][CH]C[CH][CH]1.[OH2+]#[C-].[OH2+]#[C-].[OH2+]#[C-].[Cr] 29371034 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 O=C(Nc1nc[nH]n1)COc1ccc(c(c1)C)Br.O=C(Nc1[nH]cnn1)COc1ccc(c(c1)C)Br 29450609 [CH]1[CH]CC[CH][CH]CC1.C[C@@h]1c...@h](p1c1ccccc1p...@h](C)c...@h]1c)C.[Rh] 29491188 C1C[CH][CH]CC[CH][CH]1.C[C@@h]1c...@h](p1c1ccccc1p...@h](C)c...@h]1c)C.[Rh] 29491188 C1[CH][CH]CC[CH][CH]C1.C[C@@h]1c...@h](p1c1ccccc1p...@h](C)c...@h]1c)C.[Rh] 29491188 C1[CH][CH]CC[CH][CH]C1.C[C@@h]1c...@h](p1c1ccccc1p...@h](C)c...@h]1c)C.[Rh] 29491188 C1[CH][CH]CC[CH][CH]C1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 29491195 C1C[CH][CH]CC[CH][CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 29491195 [CH]1CC[CH][CH]CC[CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 29491195 C1C[CH][CH]CC[CH][CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 29491195 C1[CH][CH]CC[CH][CH]C1.C[C@@h]1c...@h](P1C1=C(C(=O)OC1=O)p...@h](C)c...@h]1c)C.[Rh] 29491197 [CH]1CC[CH][CH]CC[CH]1.C[C@@h]1c...@h](P1C1=C(C(=O)OC1=O)p...@h](C)c...@h]1c)C.[Rh] 29491197 C1[CH][CH]CC[CH][CH]C1.C[C@@h]1c...@h](P1C1=C(C(=O)OC1=O)p...@h](C)c...@h]1c)C.[Rh] 29491197 [CH]1[CH]CC[CH][CH]CC1.C[C@@h]1c...@h](P1C1=C(C(=O)OC1=O)p...@h](C)c...@h]1c)C.[Rh] 29491197 [CH]CCCCCCCCCCCCCCC 29536355 CCCCCCCCCCCCCCC[CH] 29536355 C1CCC[CH]1 29538372 [CH]1CCCC1 29538372 [CH]=C 29538463 C=[CH] 29538463 C1[CH]CCCC1 29538482 C1CCC[CH]C1 29538482 [CH]1CCCCC1 29538482 C1C[CH]CCC1 29538482 C/C(=C(/[CH2])\C)/[CH2] 29550750 C/C(=C(/[CH2])\C)/[CH2] 29550750 C/C(=C(/[CH2])\C)/[CH2] 29550750 C/C(=C(/[CH2])\C)/[CH2] 29550750 *.CCN(CCOC(=O)C1(CCCCC1)C1CCCCC1)[C]C 29934806 *.CCN(CCOC(=O)C1(CCCCC1)C1CCCCC1)[C]C 29934806 *.CCN(CCOC(=O)C1(CCCCC1)C1CCCCC1)[C]C 29934806 *.CCN(CCOC(=O)C1(CCCCC1)C1CCCCC1)[C]C 29934806 *.CCN(N(N=O)O)CC.CCC.CC[CH2] 29934822 *.CCN(N(N=O)O)CC.CC[CH2].CCC 29934822 *.CCN(N(N=O)O)CC.CCC.[CH2]CC 29934822 *.CCN(N(N=O)O)CC.CCC.CC[CH2] 29934822 [CH2][C]([CH][C]([CH2])C)C.C[C]([CH][C]([CH2])C)[CH2].[Ru] 30155022 C[C]([CH][C]([CH2])C)[CH2].[CH2][C]([CH][C]([CH2])C)C.[Ru] 30155022 C[C]([CH]CC[CH][C](C)[CH2])[CH2].[CH2][C]([CH]CC[CH][C](C)[CH2])C.Cl[Ru]Cl.Cl[Ru]Cl 30155024 [CH2][C]([CH]CC[CH][C](C)[CH2])C.C[C]([CH]CC[CH][C](C)[CH2])[CH2].Cl[Ru]Cl.Cl[Ru]Cl 30155024 O=CNc1c(C)cccc1C.CCCCN1[CH]CCCC1.CC 30155687 O=CNc1c(C)cccc1C.CCCCN1CCCC[CH]1.CC 30155687 [CH]1CC[CH][CH]CC[CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 30177469 [CH]1CC[CH][CH]CC[CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 30177469 C1C[CH][CH]CC[CH][CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 30177469 [CH]1CC[CH][CH]CC[CH]1.c1ccc(cc1)cn1...@h]([C@@H](C1)P(c1ccccc1)c1ccccc1)P(c1ccccc1)c1ccccc1.[Rh] 30177469 C1[CH][CH]CC[CH][CH]C1.Cl[Ru]Cl 30424431 [CH]1CC[CH][CH]CC[CH]1.Cl[Ru]Cl 30424431 C1[CH][CH]CC[CH][CH]C1.Cl[Ru]Cl 30424431 [CH]1[CH]CC[CH][CH]CC1.Cl[Ru]Cl 30424431 Tim ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel