Thank you both for the advice. That helps a lot! The subsetdb.exe. is just what I'm after as I'm trying to extract human only sequences from swissprot/uniprot. I've been trying to use sed in bash to extract the identifiers to then create a new database, but have had no luck so far. I'll run subsetdb and then append a decoy library for Tandem searching.
Thanks again! On Fri, Oct 22, 2010 at 10:31 AM, Jimmy Eng <[email protected]> wrote: > If there's some set of unique identifier in the original database that > denotes all the proteins you want in the subset database, you can use > the subsetdb program. It's distributed as part of the TPP (typically > binary exists at c:\inetpub\tpp-bin\subsetdb.exe) but there's no web > interface to it that I'm aware of. > > As an example, to create a drosophila subset of the uniprot database, > you do something like: > > subsetdb.exe -MOS=Drosophila^melanogaster -ofly.fasta uniprot_sprot.fasta > > This creates an output file "fly.fasta" that contains all entries with > the text "OS=Drosophila melanogaster" in the protein description line. > The carat (^) character replaces a space. You can have multiple -M > match text string options, no match -N strings, etc. Typing the > executable w/o input arguments will show the usage statement. > > > On Thu, Oct 21, 2010 at 4:41 PM, Kristian <[email protected]> > wrote: > > Okay. To do that, all you can really do is make a smaller data base. > > There's no function in the TPP that will allow you do select a subset > > of your database. However, it's really easy to edit your database. > > Open your database in a text editor (i.e. wordpad) and you'll see the > > format the entries have. Use this format to create a new database > > that only contains the entries you are interested in. Note that > > searching against a small database will compromise your statistics > > (partly because if you're only only searching against a small number > > of possible matches, X!Tandem will probably find something that > > matches it, even if poortly; and partly because Peptide Prophet's > > error model works best if there is a large number of incorrect hits as > > well as correct hits. ). For the best results, add decoys to your > > database. You can add decoys using the tool in the TPP, or you can > > simply embed your proteins of interest in a database for another > > organism whose proteins should not give you any positive hits. > > > > On Oct 21, 3:24 pm, James Broadbent <[email protected]> wrote: > >> Thanks Kristian. I think my concept of databases and specifying > >> taxonomy is a little underdeveloped. I think what I really want is a > >> smaller, specific database. > >> > >> On Oct 22, 2:59 am, Kristian <[email protected]> wrote: > >> > >> > Do you mean search a specific database? The taxonomy file specifies > >> > the location of a database. > >> > The GUI automatically generates a taxonomy file based on the database > >> > and location you specify. > >> > If you're going to run things in command line, there are other things > >> > you can do. > >> > >> > What are you trying to do? > >> > >> > To specify the taxonomy, modify the line > >> > <note type="input" label="list path, taxonomy information">C:\Inetpub > >> > \wwwroot\ISB\data\parameters\taxonomy.xml</note> > >> > in your tandem.params file. > >> > >> > The line I have above is, I believe, the default location. > >> > >> > On Oct 20, 8:20 pm, James Broadbent <[email protected]> > wrote: > >> > >> > > Hi Everyone! > >> > >> > > Can anyone tell me how to search a specific taxonomy by specifying > it > >> > > in the tandem.params file when running searches in the TPP GUI? > >> > >> > > Thanks, > >> > >> > > James > > > > -- > > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > > > > > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
