Abe Heifets has been talking with me about an Open Babel-based retrosynthesis 
and synthetic difficulty engine. (Abe, I hope that's a fair summary.) He has 
been working some on the code itself, but has also been looking for an open 
database of synthetic reactions for training.

It seems like if we can enhance the ChemDraw CDX support to handle reactions 
better, he can data-mine Google's chemical patents.

Chris… how would the ChemDraw code handle files? For some files, you'd place a 
call and not know if you get back an OBMol or OBReaction. How do you handle 
this in CML?

In general, we'd just need to support some new tags in CDX, which isn't a huge 
deal if I can wrap my brain around the calling convention. I'm not aware of 
other formats where you try to read a file and might get back something you 
don't expect (i.e., it's either an SD file or RXN file, but not both).

Thanks,
-Geoff

Begin forwarded message:

> Hi Geoff,
> 
> When we skyped, I mentioned that I was looking for good reaction
> databases.  I may have found one and I'd like your help to get at it.
> 
> Google recently released 10 years (and 10 terabytes) of US patents for
> free download [1].  The chemical patents come with CDX files and I'd
> like to be able to pull out the reactions in a format that's more
> amenable to further analysis, as per your and Wolf's discussion [2].
> For my purposes, atom-mapping would be useful but capturing all of the
> reaction conditions would probably not be necessary.
> 
> How hard do you think it would be to extend OpenBabel to pull out
> reactions from CDX files?  Here are some examples:
> http://www.google.com/patents?id=SEAxAAAAEBAJ&zoom=4&pg=PA4#v=onepage&q&f=false
> http://www.google.com/patents?id=XUQVAAAAEBAJ&zoom=4&pg=PA3#v=onepage&q&f=false
> http://www.google.com/patents?id=oeIDAAAAEBAJ&zoom=4&pg=PA4#v=onepage&q&f=false
> http://www.google.com/patents?id=t-wHAAAAEBAJ&zoom=4&pg=PA3#v=onepage&q&f=false
> http://www.google.com/patents?id=60wAAAAAEBAJ&zoom=4&pg=PA9#v=onepage&q&f=false
> These aren't the most straightforward set of reactions but, even if we
> can extract 80% of reactions, that'd be valuable.
> 
> 
> Also, right now it's easy to pull out a pile of molecules from the
> patents.  Please let me know if you've got a use for this kind of data
> and I can get it to you.
> 
> Cheers,
> Abe
> 
> [1] 
> http://googlepublicpolicy.blogspot.com/2010/06/free-download-10-terabytes-of-patents.html
> [2] 
> http://depth-first.com/articles/2010/09/17/reading-and-translating-chemdraw-cdx-files-with-openbabel/
> 
> 
> 
> On Mon, Nov 22, 2010 at 10:24 PM, A. Heifets <abe-...@cs.toronto.edu> wrote:
>> Ok, see you at 11:30.
>> 
>> On Mon, Nov 22, 2010 at 5:48 PM, Geoffrey Hutchison <geo...@pitt.edu> wrote:
>>> Abe,
>>> 
>>> I completely forgot another meeting tomorrow until 11:30, and I have a 
>>> follow-up at 12:30. If that's OK, I'll see you then. I'm "ghutchis."
>>> 
>>> Thanks,
>>> -Geoff
> 
> -- 
> A. Heifets
> http://www.cs.toronto.edu/~aheifets/

---
Prof. Geoffrey Hutchison
Assistant Professor, Department of Chemistry
University of Pittsburgh
http://hutchison.chem.pitt.edu/
Office: (412) 648-0492


------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to