Re: [Bioc-devel] IPI numbers in annotation packages

2015-10-05 Thread James W. MacDonald
Hi Marc,

That script has this in it:

## For now just get data for the ones that we have traditionally supported
## I don't even know if the other species are available...
speciesList = c("chipsrc_human.sqlite",
  "chipsrc_rat.sqlite",
  "chipsrc_chicken.sqlite",
  "chipsrc_zebrafish.sqlite",
  #  "chipsrc_worm.sqlite",
  #  "chipsrc_fly.sqlite",
  "chipsrc_mouse.sqlite",
  "chipsrc_bovine.sqlite"
  #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
"activated"
  ## But to activate arabidopsis, remember you have to pre-add the tables...
  #  "chipsrc_canine.sqlite",
  #  "chipsrc_rhesus.sqlite",
  #  "chipsrc_chimp.sqlite",
  #  "chipsrc_anopheles.sqlite"
  )

And there is no mention of yeast anywhere. If I search all the scripts for
say 'INSERT INTO pfam', I get

custom_anno/script/bindb.sql
328:INSERT INTO pfam

pfam/script/srcdb_pfam.sql
202:-- INSERT INTO pfamb

organism_annotation/script/bindb_yeast.sql
441:-- INSERT INTO pfam

yeast/script/bindb.sql
241:-- INSERT INTO pfam

The first one is just doing all the metadata tables, and the other three
are in code blocks that are commented out. Is it possible that you used a
script that didn't make it into svn?

Jim



On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson  wrote:

> Hi Jim,
>
> You asked me on Friday where the PFAM Ids for yeast came from and I
> couldn't recall because at the moment I was at Seattle Childrens (and thus
> nowhere near my copy of my source code).  But I also said I would look into
> it for you later (and I have).  Here is what my code tells me:  So ever
> since IPI shut down, we have been getting the PFAM and IPI data from
> UniProt.  There is a script in the UniProt.ws package
> called processDataForBuild.R that is supposed to be called by the script
> "src_build.sh" (it's the last thing that script does).  That code should
> get the pfam data from yeast for you.  Please note that yeast required a
> lot of special code to get it processed.  Nothing with yeast annotations is
> ever easy.  It's like karmic accounting to compensate for all the bread and
> beer.  ;)
>
> Let me know if you need any more explanations about what is in there.
> Because of the crazy timing, before I left I build I pushed into devel a
> fresh set of .DB0s and core packages (in late August) just in case it was
> too crazy to do a refresh right now.  But it sounds like you won't need
> that.
>
>
>   Marc
>
>
>
> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald  wrote:
>
>> I am building the annotation db0 packages for the upcoming Bioconductor
>> release, which are used to generate all the orgDb and chip annotation
>> packages that we distribute. Up to the previous release we have always
>> included IPI identifiers (as part of the table containing the PROSITE and
>> PFAM IDs). Unfortunately, IPI  is no longer
>> maintained (since 2011), and UniProt, which is where we got data for the
>> last few releases, has now dropped support as well.
>>
>> Given that this annotation source is no longer maintained, I decided to
>> exclude these IDs from the current build of the following db0 packages:
>>
>>- rat.db0
>>- chicken.db0
>>- zebrafish.db0
>>- mouse.db0
>>- bovine.db0
>>- human.db0
>>
>> In addition, it is not clear to me (nor can Marc recall) where the data
>> for
>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>> behind schedule for these packages, I have excluded that table as well.
>>
>> If this will break anybody's package, or if there are people who rely on
>> these IDs, I can just parse out of the last release and deprecate, so you
>> will have the IDs for one more release. However, if nobody cares about
>> such
>> things, I will just go with what we have. Please speak up if this will
>> affect you.
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IPI numbers in annotation packages

2015-10-05 Thread Marc Carlson
You need to scroll down that script a ways...  Look for 'yeast'.

On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald  wrote:

> Hi Marc,
>
> That script has this in it:
>
> ## For now just get data for the ones that we have traditionally supported
> ## I don't even know if the other species are available...
> speciesList = c("chipsrc_human.sqlite",
>   "chipsrc_rat.sqlite",
>   "chipsrc_chicken.sqlite",
>   "chipsrc_zebrafish.sqlite",
>   #  "chipsrc_worm.sqlite",
>   #  "chipsrc_fly.sqlite",
>   "chipsrc_mouse.sqlite",
>   "chipsrc_bovine.sqlite"
>   #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
> "activated"
>   ## But to activate arabidopsis, remember you have to pre-add the
> tables...
>   #  "chipsrc_canine.sqlite",
>   #  "chipsrc_rhesus.sqlite",
>   #  "chipsrc_chimp.sqlite",
>   #  "chipsrc_anopheles.sqlite"
>   )
>
> And there is no mention of yeast anywhere. If I search all the scripts for
> say 'INSERT INTO pfam', I get
>
> custom_anno/script/bindb.sql
> 328:INSERT INTO pfam
>
> pfam/script/srcdb_pfam.sql
> 202:-- INSERT INTO pfamb
>
> organism_annotation/script/bindb_yeast.sql
> 441:-- INSERT INTO pfam
>
> yeast/script/bindb.sql
> 241:-- INSERT INTO pfam
>
> The first one is just doing all the metadata tables, and the other three
> are in code blocks that are commented out. Is it possible that you used a
> script that didn't make it into svn?
>
> Jim
>
>
>
> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson  wrote:
>
>> Hi Jim,
>>
>> You asked me on Friday where the PFAM Ids for yeast came from and I
>> couldn't recall because at the moment I was at Seattle Childrens (and thus
>> nowhere near my copy of my source code).  But I also said I would look into
>> it for you later (and I have).  Here is what my code tells me:  So ever
>> since IPI shut down, we have been getting the PFAM and IPI data from
>> UniProt.  There is a script in the UniProt.ws package
>> called processDataForBuild.R that is supposed to be called by the script
>> "src_build.sh" (it's the last thing that script does).  That code should
>> get the pfam data from yeast for you.  Please note that yeast required a
>> lot of special code to get it processed.  Nothing with yeast annotations is
>> ever easy.  It's like karmic accounting to compensate for all the bread and
>> beer.  ;)
>>
>> Let me know if you need any more explanations about what is in there.
>> Because of the crazy timing, before I left I build I pushed into devel a
>> fresh set of .DB0s and core packages (in late August) just in case it was
>> too crazy to do a refresh right now.  But it sounds like you won't need
>> that.
>>
>>
>>   Marc
>>
>>
>>
>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald 
>> wrote:
>>
>>> I am building the annotation db0 packages for the upcoming Bioconductor
>>> release, which are used to generate all the orgDb and chip annotation
>>> packages that we distribute. Up to the previous release we have always
>>> included IPI identifiers (as part of the table containing the PROSITE and
>>> PFAM IDs). Unfortunately, IPI  is no longer
>>> maintained (since 2011), and UniProt, which is where we got data for the
>>> last few releases, has now dropped support as well.
>>>
>>> Given that this annotation source is no longer maintained, I decided to
>>> exclude these IDs from the current build of the following db0 packages:
>>>
>>>- rat.db0
>>>- chicken.db0
>>>- zebrafish.db0
>>>- mouse.db0
>>>- bovine.db0
>>>- human.db0
>>>
>>> In addition, it is not clear to me (nor can Marc recall) where the data
>>> for
>>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>>> behind schedule for these packages, I have excluded that table as well.
>>>
>>> If this will break anybody's package, or if there are people who rely on
>>> these IDs, I can just parse out of the last release and deprecate, so you
>>> will have the IDs for one more release. However, if nobody cares about
>>> such
>>> things, I will just go with what we have. Please speak up if this will
>>> affect you.
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IPI numbers in annotation packages

2015-10-05 Thread James W. MacDonald
Ah. That's the problem. The script in getdb.sh has

R --slave <
/home/ubuntu/cpb_anno/AnnotationBuildPipeline/annosrc/uniprot/script/
uniprot.ws/inst/script/processDataForBuild.R

which is a modification of what is in svn (to match the directory structure
of the AMI), which calls on a script in a local version of the UniProt.ws
package. The local version doesn't have any code for yeast, but the 'real'
version (UniProt.ws) does. I assumed the local version was special, and
that I should be using that because you were specifically using that one
rather than an actually installed package.

annosrc$ grep -i yeast uniprot/script/
uniprot.ws/inst/script/processDataForBuild.R
annosrc$
annosrc$ grep -i yeast
~/R/x86_64-pc-linux-gnu-library/3.2/UniProt.ws/script/processDataForBuild.R
## Now for special treatment for missing stuff from yeast.
getYeastData <- function(dbFile, db){
doYeastInserts <- function(db, table, data){
## just one more run through to just do what is needed to get pfam into
yeast.
species <- 'chipsrc_yeast.sqlite'
res <- getYeastData(species, db)
doYeastInserts(db, "pfam", res[["pfam"]])
doYeastInserts(db, "smart", res[["smart"]])


Thanks!

Jim



On Mon, Oct 5, 2015 at 10:16 AM, Marc Carlson  wrote:

> You need to scroll down that script a ways...  Look for 'yeast'.
>
> On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald  wrote:
>
>> Hi Marc,
>>
>> That script has this in it:
>>
>> ## For now just get data for the ones that we have traditionally supported
>> ## I don't even know if the other species are available...
>> speciesList = c("chipsrc_human.sqlite",
>>   "chipsrc_rat.sqlite",
>>   "chipsrc_chicken.sqlite",
>>   "chipsrc_zebrafish.sqlite",
>>   #  "chipsrc_worm.sqlite",
>>   #  "chipsrc_fly.sqlite",
>>   "chipsrc_mouse.sqlite",
>>   "chipsrc_bovine.sqlite"
>>   #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
>> "activated"
>>   ## But to activate arabidopsis, remember you have to pre-add the
>> tables...
>>   #  "chipsrc_canine.sqlite",
>>   #  "chipsrc_rhesus.sqlite",
>>   #  "chipsrc_chimp.sqlite",
>>   #  "chipsrc_anopheles.sqlite"
>>   )
>>
>> And there is no mention of yeast anywhere. If I search all the scripts
>> for say 'INSERT INTO pfam', I get
>>
>> custom_anno/script/bindb.sql
>> 328:INSERT INTO pfam
>>
>> pfam/script/srcdb_pfam.sql
>> 202:-- INSERT INTO pfamb
>>
>> organism_annotation/script/bindb_yeast.sql
>> 441:-- INSERT INTO pfam
>>
>> yeast/script/bindb.sql
>> 241:-- INSERT INTO pfam
>>
>> The first one is just doing all the metadata tables, and the other three
>> are in code blocks that are commented out. Is it possible that you used a
>> script that didn't make it into svn?
>>
>> Jim
>>
>>
>>
>> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson  wrote:
>>
>>> Hi Jim,
>>>
>>> You asked me on Friday where the PFAM Ids for yeast came from and I
>>> couldn't recall because at the moment I was at Seattle Childrens (and thus
>>> nowhere near my copy of my source code).  But I also said I would look into
>>> it for you later (and I have).  Here is what my code tells me:  So ever
>>> since IPI shut down, we have been getting the PFAM and IPI data from
>>> UniProt.  There is a script in the UniProt.ws package
>>> called processDataForBuild.R that is supposed to be called by the script
>>> "src_build.sh" (it's the last thing that script does).  That code should
>>> get the pfam data from yeast for you.  Please note that yeast required a
>>> lot of special code to get it processed.  Nothing with yeast annotations is
>>> ever easy.  It's like karmic accounting to compensate for all the bread and
>>> beer.  ;)
>>>
>>> Let me know if you need any more explanations about what is in there.
>>> Because of the crazy timing, before I left I build I pushed into devel a
>>> fresh set of .DB0s and core packages (in late August) just in case it was
>>> too crazy to do a refresh right now.  But it sounds like you won't need
>>> that.
>>>
>>>
>>>   Marc
>>>
>>>
>>>
>>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald 
>>> wrote:
>>>
 I am building the annotation db0 packages for the upcoming Bioconductor
 release, which are used to generate all the orgDb and chip annotation
 packages that we distribute. Up to the previous release we have always
 included IPI identifiers (as part of the table containing the PROSITE
 and
 PFAM IDs). Unfortunately, IPI  is no longer
 maintained (since 2011), and UniProt, which is where we got data for the
 last few releases, has now dropped support as well.

 Given that this annotation source is no longer maintained, I decided to
 exclude these IDs from the current build of the following db0 packages:

- rat.db0
- chicken.db0
- zebrafish.db0
- mouse.db0
- bovine.db0
- human.db0

 In addition, it is not clear to me (nor can Marc recall) where the data