Re: [Rdkit-discuss] delete a substructure

2017-03-09 Thread Chenyang Shi
Thank you Chris. I found that one too; it is quite convenient to visualize
both SMARTS and SMILES strings.

On Thu, Mar 9, 2017 at 11:28 AM, Chris Swain  wrote:

> I use SMARTSviewer at Univ of Hamburg
>
> http://www.zbh.uni-hamburg.de/en/bioinformatics-server.html
>
> Chris
>
> On 9 Mar 2017, at 17:21, rdkit-discuss-requ...@lists.sourceforge.net
> wrote:
>
> One last question I have is do you guys have convenient online or local
> documents to look up desired SMARTS.
> Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes
> with the installation of RDKIT.
> Brian suggested daylight website,
> http://www.daylight.com/dayhtml_tutorials/languages/
> smarts/smarts_examples.html, which is a good place as well.
>
> Best,
> Chenyang
>
>
>
> 
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] delete a substructure

2017-03-09 Thread Chris Swain
I use SMARTSviewer at Univ of Hamburg

http://www.zbh.uni-hamburg.de/en/bioinformatics-server.html 


Chris
> On 9 Mar 2017, at 17:21, rdkit-discuss-requ...@lists.sourceforge.net wrote:
> 
> One last question I have is do you guys have convenient online or local
> documents to look up desired SMARTS.
> Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes
> with the installation of RDKIT.
> Brian suggested daylight website,
> http://www.daylight.com/dayhtml_tutorials/languages/ 
> 
> smarts/smarts_examples.html, which is a good place as well.
> 
> Best,
> Chenyang

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] delete a substructure

2017-03-09 Thread Chenyang Shi
Thanks Hongbin and Pavel for the suggestions. I am now confident that the
approach Hongbin proposed to remove duplicate counts is a robust one. Now I
need to revisit/recheck all my SMARTS definitions.

One last question I have is do you guys have convenient online or local
documents to look up desired SMARTS.
Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes
with the installation of RDKIT.
Brian suggested daylight website,
http://www.daylight.com/dayhtml_tutorials/languages/
smarts/smarts_examples.html, which is a good place as well.

Best,
Chenyang

On Thu, Mar 9, 2017 at 1:09 AM, 杨弘宾  wrote:

> Hi Chemyang,
>
> Your issue was caused by the definition of "-OH(phenol)", I think.  If
> you define this pattern as "cO", the atom *3* will be matched since it is
> the aromatic carbon bond to an oxygen.  I guess you just wanted to match
> exactly the oxygen and restrict it with "bonding with an aromatic carbon".
> So the SMARTS should ber "[$(Oc)]", which indicates an oxygen with the
> environment of "bonding with an aromatic carbon".
>
> m = Chem.MolFromSmiles('CC1=CC(=C(C=C1)C(=O)O)O')
> m.GetSubstructMatches(Chem.MolFromSmiles('[$(Oc)]'))
> >>> ((10,),)
>
> Then only atom *10* will be matched and it won't interfere with other
> counts.
>
> Reference: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
> 4.4
>
> --
> Hongbin Yang
>
>
> *From:* Chenyang Shi 
> *Date:* 2017-03-09 01:32
> *To:* Greg Landrum 
> *CC:* rdkit-discuss ; 杨弘宾
> 
> *Subject:* Re: [Rdkit-discuss] delete a substructure
> Dear Hongbin,
>
> I tried your method on a molecule, 4-Methylsalicylic acid
> (CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in Joback
> method (using SMARTS), and used m.GetSubstructMatches to print out all
> atom positions. The result is summarized in the table.
>
> We can see there are duplicated counts--coming from COOH group. As
> suggested by Hongbin, we can remove duplicated atoms by looking at their
> positions--in this case, ((9),), ((7,8,),), ((7,),), and ((8,),) are
> subsets of ((7,8,9)) from -COOH. Indeed we can get rid of these duplicates.
> However, I also noticed that Atom (3,) from =C< (ring) group is also a part
> of -OH (phenol) ((10,3),). If we apply the same algorithm to remove
> duplicates, the =C<(ring) group will be only counted twice instead of three
> times.
>
> Greg, you mentioned as an alternative I can delete substructure using
> chemical reaction method. It would be greatly appreciated if you could show
> me (point me to) a simple example code, perhaps on a simple molecule? I
> find myself at a loss when browsing the manual. I would like to try also in
> that direction.
>
> Thanks,
> Chenyang
>
>
> [image: Inline image 1]
>
>
> On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum 
> wrote:
>
>> The solution that Hongbin proposes to the double-counting problem is a
>> good one. Just be sure to sort your substructure queries in the right order
>> so that the more complex ones come first.
>>
>> Another thing you might think about is making your queries more specific.
>> For example, as you pointed out "[OH]" is very general and matches parts of
>> carboxylic acids and a number of other functional groups. The RDKit has a
>> set of fairly well tested (though certainly not perfect) functional group
>> definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
>> definition from there looks like this:
>> [O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]
>>
>>
>> -greg
>>
>>
>> On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾  wrote:
>>
>>> Hi, Chenyang,
>>> You don't need to delete the substructure from the molecule. Just
>>> check whehter the mapped atoms have been matched. For example:
>>>
>>> m = Chem.MolFromSmiles('CC(=O)O')
>>> OH = Chem.MolFromSmarts('[OH]')
>>> COOH = Chem.MolFromSmarts('C(O)=O')
>>>
>>> m.GetSubstructMatches(OH)
>>> >> ((3,),)
>>> m.GetSubstructMatchs(COOH)
>>> >> ((1, 3, 2),)
>>>
>>> Since atom "3" has been already matched, it should be ignored.
>>> So you can create a "set" to record the matched atoms to avoid
>>> repetitive count.
>>>
>>> --
>>> Hongbin Yang 杨弘宾
>>>
>>>
>>> *From:* Chenyang Shi 
>>> *Date:* 2017-03-06 14:04
>>> *To:* Greg Landrum 
>>> *CC:* RDKit Discuss 
>>> *Subject:* Re: [Rdkit-discuss] delete a substructure
>>> Hi Greg,
>>>
>>> Thanks for a prompt reply. I did try "GetSubstructMatches()" and it
>>> returns correct numbers of substructures for CH3COOH. The potential problem
>>> with this approach is that if the molecule is getting complicated, it will
>>> possibly generate duplicate numbers for certain functional groups. For
>>> example, --OH (alcohol) group will be likely also counted in --COOH. A
>>> 

[Rdkit-discuss] show ROMol column in table

2017-03-09 Thread Volkamer, Andrea
Dear all,

I have trouble showing a table including a ROMol column.

# Add column to dataframe
PandasTools.AddMoleculeColumnToFrame(df, smilesCol='smiles')
# Remove compounds that couldn't be built
df = df[~df.ROMol.isnull()]

This works fine, but when I want to display it (df.head()), I get the following 
error:


AttributeError: 'module' object has no attribute 'format'


The first few columns are printed and have some image encoding in the 
respective column like:

https://physiologie-cbf.charite.de/en/institute/workgroups/team_volkamer/

Campus Mitte: Virchowweg 6, 10117 Berlin/ Philippstraße 13, Haus 18, 10115 
Berlin

Phone: +49 30 - 450 528 554 / 209 347 938
E-Mail: andrea.volka...@charite.de
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss