Dear RDkit community,

I would appreciate your insight into the following simple problem:


[H]C(=O)OC([C,H])([H])[H]  or
[H]C(=O)OC([#6,H])([H])[H]

[note that this notation uses [C, H] which implies that in a given position 
there can be C or H. The situation is similar in [#6,H]]

Both of them therefore should match
C(=O)OC
C(=O)OCC
C(=O)OCCC

whereas

[H]C(=O)OC([H])([H])[H]

should only match the first

C(=O)OC

while

[H]C(=O)OC([#6])([H])[H]

should only match the second and third

C(=O)OCC
C(=O)OCCC

In reality it matches only the last two
C(=O)OCC
C(=O)OCCC
it does not match the first one:
C(=O)OC .

I of course add explicit hydrogens to the target molecules, e.g. C(=O)OC?.   It 
looks like the [C, H]  notation which implies that in a given position there 
can be C or H is not recognized (it does not match the H in  the [C,H])? If not 
how can I match cases where in a given position there can be C or H with rdkit?


Thank you very much for your help.


Best regards,


Dr Janusz Petkowski

Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow>

Tel:  +1 (617) 258 - 6910<tel:%28857%29%20777-6977>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to