Dear Markus, I'm impressed that you managed to get as far as you did with this. The code for accessing pubmed wasn't really designed for what you're doing, but you've somehow made it mostly work. My compliments. :-)
On Thu, Mar 26, 2009 at 8:33 AM, markus kossner <[email protected]> wrote: > Hi all, > yesterday I tried to test out the functionality of the Pubmed modules. > First let me explain the scene: > > I want to use the RDKit Pubmed modules for Pubchem > Compound,Substance,Bioassay etc-data queries > In order to get the answer to the following question: > In which assays was 'aspirin' testet and what are the names of the > protein targets in these assays? > Here's what I did: > > import Dbase.Pubmed.Searches ,Dbase.Pubmed.QueryParams > query=Dbase.Pubmed.QueryParams.details() > query['db']='pccompound' > query['term']='aspirin' > print '...searching for term %s' %(query['term']) > res1=Dbase.Pubmed.Searches.GetSearchIds(query,url='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi') > > #This query results in the compound IDs of aspirin as an array --> for > the sake of brevity > #I'll just take the first ID and query elink for the assays that had > tested this compound: > > query=Dbase.Pubmed.QueryParams.details() > query['db']='pcassay' > query['dbfrom' ]='pccompound' > query['linkname']='pccompound_pcassay' > query['Id']=res1[0] > > res2=Dbase.Pubmed.Searches.GetSearchIds(query,url='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi') > print res2 > > #Now res2 contains the assay IDs of every assay, that my beloved aspirin > was testet in (I'll use just assay ID 1490 ). > # what I now tried is to use the 'GetRecords' method for retrieving > assay-info from the pcassay database > #(by default the 'GetRecords' method looks up PubMed, but how to change > that to pcassay and what is the bioassay analogon for > #a PubMed SummaryRecord ??? .... and by the way: What is the > structure/red line in the eutils documentation??? :-) ) > #Anyway, here's an unsucessful trial to get the name of the protein (It > should be 'phosphopantetheinyl transferase' or so ): > > query=Dbase.Pubmed.QueryParams.details() > query['db']='pcassay' > res3=Dbase.Pubmed.Searches.GetRecords(['1490'],query,url='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi') > print res3 Yes, this doesn't seem to work. In fact, if you go directly to the page that you're trying to retrieve from and enter the URL parameters by hand: "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pcassay&id=1490&retmode=xml" you'll see that pcassay doesn't support retrieving data by xml. This URL: "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pcassay&id=1490&report=docsum" on the other hand, does seem to produce something reasonable: ------------- 1: AID: 1490 qHTS Assay for Inhibitors of Bacillus subtilis Sfp phosphopantetheinyl transferase (PPTase) Source: NCGC Total substances tested:1117; Active:0 ------------- It's just not xml and probably doesn't use a standardized grammar/vocabulary. If this is useful, you can retrieve the same info from python as follows: #----------- import urllib from rdkit.Dbase.Pubmed import Searches #<- you probably skip the rdkit. url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi' ps = {'id':'1490','db':'pcassay','report':'docsum','mode':'text'} conn = Searches.openURL(url,urllib.urlencode(ps)) text = conn.read() #----------- If you want more structured (and complete) data, I think you have to use the pubchem power-users gateway or SOAP services. Those are more complex, but I know you've used them before. -greg

