> > Regarding the issues you mentioned
>    ...
> >
> > - non-canonical SMARTS
> > - duplicates are not filtered out
>
> I think I figured out a way around that via some post-processing.

Great!

>
>
> >> Or do you mean the number of molecules which contain that structure,
> >> in which case "CC" exists in 3 of the structures.
>
> Okay. You want a (say) 15% threshold, and the threshold MCS isn't
> yet ported over to RDKit. It is in that hacked module I sent the
> other day.
>
> > Following up on the example from the above section:
> > "
> > CCNCc1ccccc1
> > c1ccc(cc1)CN
> > CCNCc1ccccc1
> > c1cnc[nH]
> > "
>
> BTW, that last SMILES isn't complete.


Yep, you are absolutely right!
Here comes a slightly updated version of small data set:
CCNCc1ccccc1
NCc1ccccc1
CCNCc1ccccc1
c1c[nH]cn1
CCCn1ccnc1
Cn1cncc1CCc2ccccc2


>
> > => I would be more than happy to finally have this output:
> > "
> > newflavorOfMCS   frequency
> > ##########################
> > c1ccc(cc1)CN     3
> > c1cnc[nH]        1
> > "
>
> That script is attached. Here's an example of it in use:
>
> % python paul.py --min-num-bonds 7 paul.smi

Cool, the script does exactly the job! I'm really grateful to your help!

For the above described dataset, the "most common sub-structures" (aka
mcss) are found:
3 5 5 c1cncn1 [#6]:1:[#6]:[#7]:[#6]:[#7]:1
3 8 8 NCc1ccccc1 [#7]-[#6]-[#6]:1:[#6]:[#6]:[#6]:[#6]:[#6]:1

=>
Now let's come to another question:
How does one code the "complete-ring-only" variation?
Can your code be adapated, or shall I do some post-processing?


Cheers & a big thanks!

Paul

> A non-hacked solution could hook into the information about
> "complete-ring-only". That would give you structures more like
> what a chemist would expect, though not the scaffolds that Christos
> and Peter suggested.

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to