Hi Sarah, Thanks for sending me your protein list files. The problem is indeed an identifier mismatch issue, due to a slight syntax difference in your IPI protein database. In a typical FASTA file, the "identifier" is considered the (usually short) piece of text after the ">" and before the first space. For example, in the IPI databases that we have on hand, an entry would appear as follows:
>IPI00000001 >IPI:IPI00000001.2|SWISS-PROT:O95793-1|TREMBL:A8K622;Q59F99|ENSEMBL:ENSP00000360922;ENSP00000379466|REFSEQ:NP_059347|H-INV:HIT000329496|VEGA:OTTHUMP00000031233 > Tax_\ Id=9606 Gene_Symbol=STAU1 Isoform Long of Double-stranded RNA-binding protein Staufen homolog 1 MSQVQVQVQNPSAALSGSQILNKNQSLLSQPLMSIPSTTSSLPSENAGRPIQNSALPSAS ITSTSAAAESITPTVELNALCMKLGKKPMYKPVDPYSRMQSTYNYNMRGGAYPPRYFYPF PVPPLLYQVELSVGGQQFNGKGKTRQAAKHDAAAKALRILQNEPLPERLEVNGRESEEEN LNKSEISQVFEIALKRNLPVNFEVARESGPPHMKNFVTKVSVGEFVGEGEGKSKKISKKN AAIAVLEELKKLPPLPAVERVKPRIKKKTKPIVKPQTSPEYGQGINPISRLAQIQQAKKE KEPEYTLLTERGLPRRREFVMQVKVGNHTAEGTGTNKKVAKRNAAENMLEILGFKVPQAQ PTKPALKSEEKTPIKKPGDGRKVTFFEPGSGDENGTSNKEDEFRMPYLSHQQLPAGILPM VPEVAQAVGVSQGHHTKDFTRAAPNPAKATVTAMIARELLYGGTSPTAETILKNNISSGH VPHGPLTRPSEQLDYLSRVQGFQVEYKDFPKNNKNEFVSLINCSSQPPLISHGIGKDVES CHDMAALNILKLLSELDQQSTEMPRTGNGPMSVCGRC In this this case, the identifier is "IPI00000001" (after the ">" and before the first space), which is why I instructed you to have your list of identifiers in this format. In your case, however, your protein database has the following format: >IPI:IPI00000001.2|SWISS-PROT:O95793-1|TREMBL:A8K622;Q59F99|ENSEMBL:ENSP00000360922;ENSP00000379466|REFSEQ:NP_059347|H-INV:HIT000329496|VEGA:OTTHUMP00000031233 > Tax_Id=9606 Gene_Symbol=STAU1 Isoform Long of Double-stranded RNA-binding >protein Staufen homolog 1 MSQVQVQVQNPSAALSGSQILNKNQSLLSQPLMSIPSTTSSLPSENAGRPIQNSALPSAS ITSTSAAAESITPTVELNALCMKLGKKPMYKPVDPYSRMQSTYNYNMRGGAYPPRYFYPF PVPPLLYQVELSVGGQQFNGKGKTRQAAKHDAAAKALRILQNEPLPERLEVNGRESEEEN LNKSEISQVFEIALKRNLPVNFEVARESGPPHMKNFVTKVSVGEFVGEGEGKSKKISKKN AAIAVLEELKKLPPLPAVERVKPRIKKKTKPIVKPQTSPEYGQGINPISRLAQIQQAKKE KEPEYTLLTERGLPRRREFVMQVKVGNHTAEGTGTNKKVAKRNAAENMLEILGFKVPQAQ PTKPALKSEEKTPIKKPGDGRKVTFFEPGSGDENGTSNKEDEFRMPYLSHQQLPAGILPM VPEVAQAVGVSQGHHTKDFTRAAPNPAKATVTAMIARELLYGGTSPTAETILKNNISSGH VPHGPLTRPSEQLDYLSRVQGFQVEYKDFPKNNKNEFVSLINCSSQPPLISHGIGKDVES CHDMAALNILKLLSELDQQSTEMPRTGNGPMSVCGRC In this case, the first space is not until after the list of various database accession numbers, so the "identifier" would actually be this very long string, i.e. "IPI:IPI00000001.2|SWISS-PROT:O95793-1| TREMBL:A8K622;Q59F99|ENSEMBL:ENSP00000360922;ENSP00000379466| REFSEQ:NP_059347|H-INV:HIT000329496|VEGA:OTTHUMP00000031233" To solve this problem, you have two options. One, you can make your list of protein identifiers look the current "identifiers" in your FASTA file (i.e. list this long string for every protein). This is usually not ideal, but if you happen to have this string for all of your proteins of interest, would be the easiest way to go. Two, you can modify your FASTA file so that it will contain identifiers of a more managable format. To do so, you'll need to run some sort of string replace script to replace every "|" character with a space. This will change your identifiers to the format "IPI:IPI00000001.2". You could also remove the leading "IPI:" to make your identifiers simply "IPI00000001.2". If you have trouble with these two options, I can also send you a human IPI database that already has identifiers in the "IPI00000001" format. The latest version I have is v3.54. Hope this helps! Let me know if I can be of further assistance! Carly On Oct 7, 5:18 pm, Carly <[email protected]> wrote: > Hi Sarah, > > Can you send me your protein list and an excerpt from your protein > database? I still suspect it's some sort of identifier mismatch issue, > especially given that your peptides/transitions are lost at the step > of creating the protein-restricted library. > > Thanks! > > Carly > > On Oct 7, 2:09 pm, Sarah <[email protected]> wrote: > > > Hi Carly, > > > thanks for your ideas. Unfortunately, still no transitions on my > > list....I did make sure the "protein" box was checked and also created > > a new list with IPI identifiers (the IPI identifier in the format > > IPI00123456 is sufficient, right?) to double check the format. This is > > the log for this search: > > > # Commands for session MV240KSZQ on Wed Oct 7 16:44:07 2009 > > # BEGIN COMMAND BLOCK > > ###### BEGIN Command Execution ###### > > [Wed Oct 7 16:44:08 2009] EXECUTING: run_marimba.pl -t10 -s3 -k -m300 > > -M1200 -R c:/Inetpub/wwwroot/ISB/data/IPIwhole.txt -X C,M -T Q -i y -z > > 2,3 -Z 1,2 c:/Inetpub/wwwroot/ISB/data/MaRiMba/ > > NIST_human_IT_v3.0_2009_02_04_7AA.splib c:/Inetpub/wwwroot/ISB/data/ > > ipi.HUMAN.v3.64.fasta c:/Inetpub/wwwroot/ISB/data/MRMlist.txt > > OUTPUT: > > Run MaRiMba started at: Wed Oct 7 16:44:08 2009 > > ...Refreshing library against database > > ...Filtering out non-proteotypic and unmapped peptides in a tryptic > > context only > > SpectraST started at Wed Oct 07 16:44:08 2009. > > Creating library from "c:/Inetpub/wwwroot/ISB/data/MaRiMba/ > > NIST_human_IT_v3.0_2009_02_04_7AA.splib" > > REFRESHING protein mappings...DONE! > > Importing peptide ions... > > 500...1000...1500...2000...2500...3000...3500...4000...4500...5000...5500...6000...6500...7000...7500...8000...8500...9000...9500...10000...10500...11000...11500...12000...12500...13000...13500...14000...14500...15000...15500...16000...16500...17000...17500...18000...18500...19000...19500...20000...20500...21000...21500...22000...22500...23000...23500...24000...24500...25000...25500...26000...26500...27000...27500...28000...28500...29000...29500...30000...30500...31000...31500...32000...32500...33000...33500...34000...34500...35000...35500...36000...36500...37000...37500...38000...38500...39000...39500...40000...40500...41000...41500...42000...42500...43000...43500...44000...44500...45000...45500...46000...46500...47000...47500...48000...48500...49000...49500...50000...50500...51000...51500...52000...52500...53000...53500...54000...54500...55000...55500...56000...56500...57000...57500...58000...58500...59000...59500...60000...60500...61000...61500...62000...62500...63000...63500...64000...64500...65000...65500...66000...66500...67000...67500...68000...68500...69000...69500...70000...70500...71000...71500...72000...72500...73000...73500...74000...74500...75000...75500...76000...76500...77000...77500...78000...78500...79000...79500...80000...80500...81000...81500...82000...82500...83000...83500...84000...84500...85000...85500...86000...86500...87000...87500...88000...88500...89000...89500...90000...90500...91000...91500...92000...92500...93000...93500...94000...94500...95000...95500...96000...96500...97000...97500...98000...98500...99000...99500...100000...100500...101000...101500...102000...102500...103000...103500...104000...104500...105000...105500...106000...106500...107000...107500...108000...108500...109000...109500...110000...110500...111000...111500...112000...112500...113000...113500...114000...114500...115000...115500...116000...116500...117000...117500...118000...118500...119000...119500...120000...120500...121000...121500...122000...122500...123000...123500...124000...124500...125000...125500...126000...126500...127000...127500...128000...128500...129000...129500...130000...130500...131000...131500...132000...132500...133000...133500...134000...134500...135000...135500...136000...136500...137000...137500...138000...138500...139000...139500...140000...140500...141000...141500...142000...142500...143000...143500...144000...144500...145000...145500...146000...146500...147000...147500...148000...148500...149000...149500...150000...150500...151000...151500...152000...152500...153000...153500...154000...154500...155000...155500...156000...156500...157000...157500...158000...158500...159000...159500...160000...160500...161000...161500...162000...162500...163000...163500...164000...164500...165000...165500...166000...166500...167000...167500...168000...168500...169000...169500...170000...170500...171000...171500...172000...172500...173000...173500...174000...174500...175000...175500...176000...176500...177000...177500...178000...178500...179000...179500...180000...180500...181000...181500...182000...182500...183000...183500...184000...184500...185000...185500...186000...186500...187000...187500...188000...188500...189000...189500...190000...190500...191000...191500...192000...192500...193000...193500...194000...194500...195000...195500...196000...196500...197000...197500...198000...198500...199000...199500...200000...200500...201000...201500...202000...202500...203000...203500...204000...204500...205000...205500...206000...206500...207000...207500...208000...208500...209000...209500...210000...210500...211000...211500...212000...212500...213000...213500...214000...214500...215000...215500...216000...216500...217000...217500...218000...218500...219000...219500...220000...220500...221000...221500...222000...222500...223000...223500...224000...224500...225000...225500...226000...226500...227000...227500...228000...228500...229000...229500...230000...230500...231000...231500...232000...232500...233000...233500...234000...234500...235000...235500...236000...236500...237000...237500...238000...238500...239000...239500...240000...240500...241000...241500...242000...242500...243000...243500...244000...244500...245000...245500...246000...246500...247000...247500...248000...248500...249000...249500...250000...250500...251000...251500...252000...252500...253000...253500...254000...254500...255000...255500...256000...256500...257000...257500...258000...258500...259000...259500...260000...260500...DONE! > > > Library file (BINARY) "tmp_refreshed.splib" created. > > Library file (TEXT) "tmp_refreshed.sptxt" created. > > M/Z Index file "tmp_refreshed.spidx" created. > > Peptide Index file "tmp_refreshed.pepidx" created. > > > Total number of spectra in library: 63094 > > Total number of distinct peptide ions in library: 63094 > > Total number of distinct stripped peptides in library: 37177 > > CHARGE +1: 4595 ; +2: 34631 ; +3: 20407 > > TERMINI Tryptic: 63094 ; Semi-tryptic: 0 ; Non-tryptic: 0 > > METHIONINE MOD Oxidized: 8160 > > CYSTEINE MOD CAM: 9231 ; Cleavable-ICAT: 1181 ; Uncleavable-ICAT: > > 1453 > > PROBABILITY >0.9999: 47423 ; 0.999-0.9999: 9010 ; 0.99-0.999: > > 4724 ; 0.9-0.99: 1937 <0.9: 0 > > NREPS 20+: 11473 ; 10-19: 8857 ; 4-9: 18596 ; 2-3: 24168 ; > > 1: 0 > > > Total Run Time = 938 seconds. > > SpectraST finished at Wed Oct 07 16:59:46 2009 without error. > > ...Restricting input library to proteins on list > > SpectraST started at Wed Oct 07 16:59:46 2009. > > Creating library from "c://tmp_refreshed.splib" > > Importing peptide ions... > > 500...1000...1500...2000...2500...3000...3500...4000...4500...5000...5500...6000...6500...7000...7500...8000...8500...9000...9500...10000...10500...11000...11500...12000...12500...13000...13500...14000...14500...15000...15500...16000...16500...17000...17500...18000...18500...19000...19500...20000...20500...21000...21500...22000...22500...23000...23500...24000...24500...25000...25500...26000...26500...27000...27500...28000...28500...29000...29500...30000...30500...31000...31500...32000...32500...33000...33500...34000...34500...35000...35500...36000...36500...37000...37500...38000...38500...39000...39500...40000...40500...41000...41500...42000...42500...43000...43500...44000...44500...45000...45500...46000...46500...47000...47500...48000...48500...49000...49500...50000...50500...51000...51500...52000...52500...53000...53500...54000...54500...55000...55500...56000...56500...57000...57500...58000...58500...59000...59500...60000...60500...61000...61500...62000...62500...63000...DONE! > > > Library file (BINARY) "tmp_restricted.splib" created. > > Library file (TEXT) "tmp_restricted.sptxt" created. > > M/Z Index file "tmp_restricted.spidx" created. > > Peptide Index file "tmp_restricted.pepidx" created. > > > Total number of spectra in library: 0 > > Total number of distinct peptide ions in library: 0 > > Total number of distinct stripped peptides in library: 0 > > CHARGE +1: 0 ; +2: 0 ; +3: 0 > > TERMINI Tryptic: 0 ; Semi-tryptic: 0 ; Non-tryptic: 0 > > METHIONINE MOD Oxidized: 0 > > CYSTEINE MOD CAM: 0 ; Cleavable-ICAT: 0 ; Uncleavable-ICAT: 0 > > PROBABILITY >0.9999: 0 ; 0.999-0.9999: 0 ; 0.99-0.999: 0 ; > > 0.9-0.99: 0 <0.9: 0 > > NREPS 20+: 0 ; 10-19: 0 ; 4-9: 0 ; 2-3: 0 ; 1: 0 > > > Total Run Time = 110 seconds. > > SpectraST finished at Wed Oct 07 17:01:36 2009 without error. > > ...Creating consensus library > > SpectraST started at Wed Oct 07 17:01:36 2009. > > Creating CONSENSUS library from "c://tmp_restricted.splib" > > Importing peptide ions...DONE! > > > Library file (BINARY) "tmp_consensus.splib" created. > > Library file (TEXT) "tmp_consensus.sptxt" created. > > M/Z Index file "tmp_consensus.spidx" created. > > Peptide Index file "tmp_consensus.pepidx" created. > > > Total number of spectra in library: 0 > > Total number of distinct peptide ions in library: 0 > > Total number of distinct stripped peptides in library: 0 > > CHARGE +1: 0 ; +2: 0 ; +3: 0 > > TERMINI Tryptic: 0 ; Semi-tryptic: 0 ; Non-tryptic: 0 > > METHIONINE MOD Oxidized: 0 > > CYSTEINE MOD CAM: 0 ; Cleavable-ICAT: 0 ; Uncleavable-ICAT: 0 > > PROBABILITY >0.9999: 0 ; 0.999-0.9999: 0 ; 0.99-0.999: 0 ; > > 0.9-0.99: 0 <0.9: 0 > > NREPS 20+: 0 ; 10-19: 0 ; 4-9: 0 ; 2-3: 0 ; 1: 0 > > > Total Run > > ... > > read more » --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
