Hi Sarah,

Thanks for sending me your protein list files. The problem is indeed
an identifier mismatch issue, due to a slight syntax difference in
your IPI protein database. In a typical FASTA file, the "identifier"
is considered the (usually short) piece of text after the ">" and
before the first space. For example, in the IPI databases that we have
on hand, an entry would appear as follows:

>IPI00000001 
>IPI:IPI00000001.2|SWISS-PROT:O95793-1|TREMBL:A8K622;Q59F99|ENSEMBL:ENSP00000360922;ENSP00000379466|REFSEQ:NP_059347|H-INV:HIT000329496|VEGA:OTTHUMP00000031233
> Tax_\
Id=9606 Gene_Symbol=STAU1 Isoform Long of Double-stranded RNA-binding
protein Staufen homolog 1
MSQVQVQVQNPSAALSGSQILNKNQSLLSQPLMSIPSTTSSLPSENAGRPIQNSALPSAS
ITSTSAAAESITPTVELNALCMKLGKKPMYKPVDPYSRMQSTYNYNMRGGAYPPRYFYPF
PVPPLLYQVELSVGGQQFNGKGKTRQAAKHDAAAKALRILQNEPLPERLEVNGRESEEEN
LNKSEISQVFEIALKRNLPVNFEVARESGPPHMKNFVTKVSVGEFVGEGEGKSKKISKKN
AAIAVLEELKKLPPLPAVERVKPRIKKKTKPIVKPQTSPEYGQGINPISRLAQIQQAKKE
KEPEYTLLTERGLPRRREFVMQVKVGNHTAEGTGTNKKVAKRNAAENMLEILGFKVPQAQ
PTKPALKSEEKTPIKKPGDGRKVTFFEPGSGDENGTSNKEDEFRMPYLSHQQLPAGILPM
VPEVAQAVGVSQGHHTKDFTRAAPNPAKATVTAMIARELLYGGTSPTAETILKNNISSGH
VPHGPLTRPSEQLDYLSRVQGFQVEYKDFPKNNKNEFVSLINCSSQPPLISHGIGKDVES
CHDMAALNILKLLSELDQQSTEMPRTGNGPMSVCGRC

In this this case, the identifier is "IPI00000001" (after the ">" and
before the first space), which is why I instructed you to have your
list of identifiers in this format. In your case, however, your
protein database has the following format:

>IPI:IPI00000001.2|SWISS-PROT:O95793-1|TREMBL:A8K622;Q59F99|ENSEMBL:ENSP00000360922;ENSP00000379466|REFSEQ:NP_059347|H-INV:HIT000329496|VEGA:OTTHUMP00000031233
> Tax_Id=9606 Gene_Symbol=STAU1 Isoform Long of Double-stranded RNA-binding 
>protein Staufen homolog 1
MSQVQVQVQNPSAALSGSQILNKNQSLLSQPLMSIPSTTSSLPSENAGRPIQNSALPSAS
ITSTSAAAESITPTVELNALCMKLGKKPMYKPVDPYSRMQSTYNYNMRGGAYPPRYFYPF
PVPPLLYQVELSVGGQQFNGKGKTRQAAKHDAAAKALRILQNEPLPERLEVNGRESEEEN
LNKSEISQVFEIALKRNLPVNFEVARESGPPHMKNFVTKVSVGEFVGEGEGKSKKISKKN
AAIAVLEELKKLPPLPAVERVKPRIKKKTKPIVKPQTSPEYGQGINPISRLAQIQQAKKE
KEPEYTLLTERGLPRRREFVMQVKVGNHTAEGTGTNKKVAKRNAAENMLEILGFKVPQAQ
PTKPALKSEEKTPIKKPGDGRKVTFFEPGSGDENGTSNKEDEFRMPYLSHQQLPAGILPM
VPEVAQAVGVSQGHHTKDFTRAAPNPAKATVTAMIARELLYGGTSPTAETILKNNISSGH
VPHGPLTRPSEQLDYLSRVQGFQVEYKDFPKNNKNEFVSLINCSSQPPLISHGIGKDVES
CHDMAALNILKLLSELDQQSTEMPRTGNGPMSVCGRC

In this case, the first space is not until after the list of various
database accession numbers, so the "identifier" would actually be this
very long string, i.e. "IPI:IPI00000001.2|SWISS-PROT:O95793-1|
TREMBL:A8K622;Q59F99|ENSEMBL:ENSP00000360922;ENSP00000379466|
REFSEQ:NP_059347|H-INV:HIT000329496|VEGA:OTTHUMP00000031233"

To solve this problem, you have two options. One, you can make your
list of protein identifiers look the current "identifiers" in your
FASTA file (i.e. list this long string for every protein). This is
usually not ideal, but if you happen to have this string for all of
your proteins of interest, would be the easiest way to go.

Two, you can modify your FASTA file so that it will contain
identifiers of a more managable format. To do so, you'll need to run
some sort of string replace script to replace every "|" character with
a space. This will change your identifiers to the format
"IPI:IPI00000001.2". You could also remove the leading "IPI:" to make
your identifiers simply "IPI00000001.2".

If you have trouble with these two options, I can also send you a
human IPI database that already has identifiers in the "IPI00000001"
format. The latest version I have is v3.54.

Hope this helps! Let me know if I can be of further assistance!

Carly




On Oct 7, 5:18 pm, Carly <[email protected]> wrote:
> Hi Sarah,
>
> Can you send me your protein list and an excerpt from your protein
> database? I still suspect it's some sort of identifier mismatch issue,
> especially given that your peptides/transitions are lost at the step
> of creating the protein-restricted library.
>
> Thanks!
>
> Carly
>
> On Oct 7, 2:09 pm, Sarah <[email protected]> wrote:
>
> > Hi Carly,
>
> > thanks for your ideas. Unfortunately, still no transitions on my
> > list....I did make sure the "protein" box was checked and also created
> > a new list with IPI identifiers (the IPI identifier  in the format
> > IPI00123456 is sufficient, right?) to double check the format. This is
> > the log for this search:
>
> > # Commands for session MV240KSZQ on Wed Oct  7 16:44:07 2009
> > # BEGIN COMMAND BLOCK
> > ###### BEGIN Command Execution ######
> > [Wed Oct  7 16:44:08 2009] EXECUTING: run_marimba.pl -t10 -s3 -k -m300
> > -M1200 -R c:/Inetpub/wwwroot/ISB/data/IPIwhole.txt -X C,M -T Q -i y -z
> > 2,3 -Z 1,2 c:/Inetpub/wwwroot/ISB/data/MaRiMba/
> > NIST_human_IT_v3.0_2009_02_04_7AA.splib c:/Inetpub/wwwroot/ISB/data/
> > ipi.HUMAN.v3.64.fasta c:/Inetpub/wwwroot/ISB/data/MRMlist.txt
> > OUTPUT:
> > Run MaRiMba started at: Wed Oct  7 16:44:08 2009
> > ...Refreshing library against database
> > ...Filtering out non-proteotypic and unmapped peptides in a tryptic
> > context only
> > SpectraST started at Wed Oct 07 16:44:08 2009.
> > Creating library from "c:/Inetpub/wwwroot/ISB/data/MaRiMba/
> > NIST_human_IT_v3.0_2009_02_04_7AA.splib"
> > REFRESHING protein mappings...DONE!
> > Importing peptide ions...
> > 500...1000...1500...2000...2500...3000...3500...4000...4500...5000...5500...6000...6500...7000...7500...8000...8500...9000...9500...10000...10500...11000...11500...12000...12500...13000...13500...14000...14500...15000...15500...16000...16500...17000...17500...18000...18500...19000...19500...20000...20500...21000...21500...22000...22500...23000...23500...24000...24500...25000...25500...26000...26500...27000...27500...28000...28500...29000...29500...30000...30500...31000...31500...32000...32500...33000...33500...34000...34500...35000...35500...36000...36500...37000...37500...38000...38500...39000...39500...40000...40500...41000...41500...42000...42500...43000...43500...44000...44500...45000...45500...46000...46500...47000...47500...48000...48500...49000...49500...50000...50500...51000...51500...52000...52500...53000...53500...54000...54500...55000...55500...56000...56500...57000...57500...58000...58500...59000...59500...60000...60500...61000...61500...62000...62500...63000...63500...64000...64500...65000...65500...66000...66500...67000...67500...68000...68500...69000...69500...70000...70500...71000...71500...72000...72500...73000...73500...74000...74500...75000...75500...76000...76500...77000...77500...78000...78500...79000...79500...80000...80500...81000...81500...82000...82500...83000...83500...84000...84500...85000...85500...86000...86500...87000...87500...88000...88500...89000...89500...90000...90500...91000...91500...92000...92500...93000...93500...94000...94500...95000...95500...96000...96500...97000...97500...98000...98500...99000...99500...100000...100500...101000...101500...102000...102500...103000...103500...104000...104500...105000...105500...106000...106500...107000...107500...108000...108500...109000...109500...110000...110500...111000...111500...112000...112500...113000...113500...114000...114500...115000...115500...116000...116500...117000...117500...118000...118500...119000...119500...120000...120500...121000...121500...122000...122500...123000...123500...124000...124500...125000...125500...126000...126500...127000...127500...128000...128500...129000...129500...130000...130500...131000...131500...132000...132500...133000...133500...134000...134500...135000...135500...136000...136500...137000...137500...138000...138500...139000...139500...140000...140500...141000...141500...142000...142500...143000...143500...144000...144500...145000...145500...146000...146500...147000...147500...148000...148500...149000...149500...150000...150500...151000...151500...152000...152500...153000...153500...154000...154500...155000...155500...156000...156500...157000...157500...158000...158500...159000...159500...160000...160500...161000...161500...162000...162500...163000...163500...164000...164500...165000...165500...166000...166500...167000...167500...168000...168500...169000...169500...170000...170500...171000...171500...172000...172500...173000...173500...174000...174500...175000...175500...176000...176500...177000...177500...178000...178500...179000...179500...180000...180500...181000...181500...182000...182500...183000...183500...184000...184500...185000...185500...186000...186500...187000...187500...188000...188500...189000...189500...190000...190500...191000...191500...192000...192500...193000...193500...194000...194500...195000...195500...196000...196500...197000...197500...198000...198500...199000...199500...200000...200500...201000...201500...202000...202500...203000...203500...204000...204500...205000...205500...206000...206500...207000...207500...208000...208500...209000...209500...210000...210500...211000...211500...212000...212500...213000...213500...214000...214500...215000...215500...216000...216500...217000...217500...218000...218500...219000...219500...220000...220500...221000...221500...222000...222500...223000...223500...224000...224500...225000...225500...226000...226500...227000...227500...228000...228500...229000...229500...230000...230500...231000...231500...232000...232500...233000...233500...234000...234500...235000...235500...236000...236500...237000...237500...238000...238500...239000...239500...240000...240500...241000...241500...242000...242500...243000...243500...244000...244500...245000...245500...246000...246500...247000...247500...248000...248500...249000...249500...250000...250500...251000...251500...252000...252500...253000...253500...254000...254500...255000...255500...256000...256500...257000...257500...258000...258500...259000...259500...260000...260500...DONE!
>
> > Library file (BINARY) "tmp_refreshed.splib" created.
> > Library file (TEXT) "tmp_refreshed.sptxt" created.
> > M/Z Index file "tmp_refreshed.spidx" created.
> > Peptide Index file "tmp_refreshed.pepidx" created.
>
> > Total number of spectra in library: 63094
> > Total number of distinct peptide ions in library: 63094
> > Total number of distinct stripped peptides in library: 37177
> > CHARGE            +1: 4595 ; +2: 34631 ; +3: 20407
> > TERMINI           Tryptic: 63094 ; Semi-tryptic: 0 ; Non-tryptic: 0
> > METHIONINE MOD    Oxidized: 8160
> > CYSTEINE MOD      CAM: 9231 ; Cleavable-ICAT: 1181 ; Uncleavable-ICAT:
> > 1453
> > PROBABILITY       >0.9999: 47423 ; 0.999-0.9999: 9010 ; 0.99-0.999:
> > 4724 ; 0.9-0.99: 1937 <0.9: 0
> > NREPS             20+: 11473 ; 10-19: 8857 ; 4-9: 18596 ; 2-3: 24168 ;
> > 1: 0
>
> > Total Run Time = 938 seconds.
> > SpectraST finished at Wed Oct 07 16:59:46 2009 without error.
> > ...Restricting input library to proteins on list
> > SpectraST started at Wed Oct 07 16:59:46 2009.
> > Creating library from "c://tmp_refreshed.splib"
> > Importing peptide ions...
> > 500...1000...1500...2000...2500...3000...3500...4000...4500...5000...5500...6000...6500...7000...7500...8000...8500...9000...9500...10000...10500...11000...11500...12000...12500...13000...13500...14000...14500...15000...15500...16000...16500...17000...17500...18000...18500...19000...19500...20000...20500...21000...21500...22000...22500...23000...23500...24000...24500...25000...25500...26000...26500...27000...27500...28000...28500...29000...29500...30000...30500...31000...31500...32000...32500...33000...33500...34000...34500...35000...35500...36000...36500...37000...37500...38000...38500...39000...39500...40000...40500...41000...41500...42000...42500...43000...43500...44000...44500...45000...45500...46000...46500...47000...47500...48000...48500...49000...49500...50000...50500...51000...51500...52000...52500...53000...53500...54000...54500...55000...55500...56000...56500...57000...57500...58000...58500...59000...59500...60000...60500...61000...61500...62000...62500...63000...DONE!
>
> > Library file (BINARY) "tmp_restricted.splib" created.
> > Library file (TEXT) "tmp_restricted.sptxt" created.
> > M/Z Index file "tmp_restricted.spidx" created.
> > Peptide Index file "tmp_restricted.pepidx" created.
>
> > Total number of spectra in library: 0
> > Total number of distinct peptide ions in library: 0
> > Total number of distinct stripped peptides in library: 0
> > CHARGE            +1: 0 ; +2: 0 ; +3: 0
> > TERMINI           Tryptic: 0 ; Semi-tryptic: 0 ; Non-tryptic: 0
> > METHIONINE MOD    Oxidized: 0
> > CYSTEINE MOD      CAM: 0 ; Cleavable-ICAT: 0 ; Uncleavable-ICAT: 0
> > PROBABILITY       >0.9999: 0 ; 0.999-0.9999: 0 ; 0.99-0.999: 0 ;
> > 0.9-0.99: 0 <0.9: 0
> > NREPS             20+: 0 ; 10-19: 0 ; 4-9: 0 ; 2-3: 0 ; 1: 0
>
> > Total Run Time = 110 seconds.
> > SpectraST finished at Wed Oct 07 17:01:36 2009 without error.
> > ...Creating consensus library
> > SpectraST started at Wed Oct 07 17:01:36 2009.
> > Creating CONSENSUS library from "c://tmp_restricted.splib"
> > Importing peptide ions...DONE!
>
> > Library file (BINARY) "tmp_consensus.splib" created.
> > Library file (TEXT) "tmp_consensus.sptxt" created.
> > M/Z Index file "tmp_consensus.spidx" created.
> > Peptide Index file "tmp_consensus.pepidx" created.
>
> > Total number of spectra in library: 0
> > Total number of distinct peptide ions in library: 0
> > Total number of distinct stripped peptides in library: 0
> > CHARGE            +1: 0 ; +2: 0 ; +3: 0
> > TERMINI           Tryptic: 0 ; Semi-tryptic: 0 ; Non-tryptic: 0
> > METHIONINE MOD    Oxidized: 0
> > CYSTEINE MOD      CAM: 0 ; Cleavable-ICAT: 0 ; Uncleavable-ICAT: 0
> > PROBABILITY       >0.9999: 0 ; 0.999-0.9999: 0 ; 0.99-0.999: 0 ;
> > 0.9-0.99: 0 <0.9: 0
> > NREPS             20+: 0 ; 10-19: 0 ; 4-9: 0 ; 2-3: 0 ; 1: 0
>
> > Total Run
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to