Re: [Rdkit-discuss] using the PubMed functions

Greg Landrum Fri, 27 Mar 2009 05:18:26 +0000

Dear Markus,

I'm impressed that you managed to get as far as you did with this. The
code for accessing pubmed wasn't really designed for what you're
doing, but you've somehow made it mostly work. My compliments. :-)


On Thu, Mar 26, 2009 at 8:33 AM, markus kossner <[email protected]> wrote:
> Hi all,
> yesterday I tried to test out the functionality of the Pubmed modules.
> First let me explain the scene:
>
> I want to use the RDKit Pubmed modules for Pubchem
> Compound,Substance,Bioassay etc-data queries
> In order to get the answer to the following question:
> In which assays was 'aspirin' testet and what are the names of the
> protein targets in these assays?
> Here's what I did:
>
> import Dbase.Pubmed.Searches ,Dbase.Pubmed.QueryParams
> query=Dbase.Pubmed.QueryParams.details()
> query['db']='pccompound'
> query['term']='aspirin'
> print '...searching for term %s' %(query['term'])
> res1=Dbase.Pubmed.Searches.GetSearchIds(query,url='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi')
>
> #This query results in the compound IDs of aspirin as an array --> for
> the sake of brevity
> #I'll just take the first ID and query elink for the assays that had
> tested this compound:
>
> query=Dbase.Pubmed.QueryParams.details()
> query['db']='pcassay'
> query['dbfrom' ]='pccompound'
> query['linkname']='pccompound_pcassay'
> query['Id']=res1[0]
>
> res2=Dbase.Pubmed.Searches.GetSearchIds(query,url='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi')
> print res2
>
> #Now res2 contains the assay IDs of every assay, that my beloved aspirin
> was testet  in (I'll use just assay ID 1490 ).
> # what I now tried is to use  the  'GetRecords' method for retrieving
> assay-info from the pcassay database
> #(by default the 'GetRecords' method looks up PubMed, but how to change
> that to pcassay and what is the bioassay analogon for
> #a PubMed SummaryRecord ??? .... and by the way: What is the
> structure/red line in the eutils documentation??? :-) )
> #Anyway, here's an unsucessful trial to get the name of the protein (It
> should be 'phosphopantetheinyl transferase' or so ):
>
> query=Dbase.Pubmed.QueryParams.details()
> query['db']='pcassay'
> res3=Dbase.Pubmed.Searches.GetRecords(['1490'],query,url='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi')
> print res3

Yes, this doesn't seem to work. In fact, if you go directly to the
page that you're trying to retrieve from and enter the URL parameters
by hand:
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pcassay&id=1490&retmode=xml";
you'll see that pcassay doesn't support retrieving data by xml.

This URL:
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pcassay&id=1490&report=docsum";
on the other hand, does seem to produce something reasonable:
-------------
1: AID: 1490
qHTS Assay for Inhibitors of Bacillus subtilis Sfp phosphopantetheinyl
transferase (PPTase)
Source: NCGC
Total substances tested:1117; Active:0
-------------
It's just not xml and probably doesn't use a standardized grammar/vocabulary.

If this is useful, you can retrieve the same info from python as follows:
#-----------
import urllib
from rdkit.Dbase.Pubmed import Searches  #<- you probably skip the rdkit.
url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi'
ps = {'id':'1490','db':'pcassay','report':'docsum','mode':'text'}
conn = Searches.openURL(url,urllib.urlencode(ps))
text = conn.read()
#-----------

If you want more structured (and complete) data, I think you have to
use the pubchem power-users gateway or SOAP services. Those are more
complex, but I know you've used them before.

-greg

Re: [Rdkit-discuss] using the PubMed functions

Reply via email to