Okay - When I ran the working set of spectra with the database that failed, it seems to have failed; when I ran the set of spectra that failed with a database that worked, it ran to completion. I think we can probably narrow the problem down to something in the database.
On Friday, October 23, 2020 at 1:56:18 AM UTC-4 Emily Kawaler wrote: > While those tests are still running, I pulled out all 185 of the proteins > that are in the 10OV pepXMLs but not in 01-09OV, figuring that maybe one of > those is causing the error. I've uploaded that to the same folder > everything else is in (it's called 10OV_uniq.fasta) - I don't see anything > that jumps out immediately. (There are no individual characters unique to > either the headers or the sequences in 10OV, so I don't think there's an > individual character messing things up.) > > On Thursday, October 22, 2020 at 3:49:18 PM UTC-4 David Shteynberg wrote: > >> I just re extracted that file and I don't see the issue anymore. Perhaps >> this was a decompression issue. >> >> Thanks for checking. >> >> -David >> >> On Thu, Oct 22, 2020 at 12:19 PM Emily Kawaler <[email protected]> wrote: >> >>> Hello, >>> Thanks so much for taking a look! I think the selenocysteines ("U") are >>> likely not the problem, since I've got those in all of my databases, >>> including the ones that run correctly. I'm looking at >>> 03CPTAC_OVprospective_W_PNNL_20161212_B1S3_f13.pepXML and I don't see >>> anything odd in line 171821 ("</modification_info>"), so I think our line >>> numberings might not match up - what does your problematic line contain? >>> >> >>> When I try to run it on my end, it always sticks somewhere in the >>> 10CPTAC_OV files. Right now I'm running a working set of spectra with a >>> database that didn't work and vice versa, so hopefully that'll help me pin >>> down whether it's a problem with my spectra or my database - will let you >>> know how that turns out! >>> >>> Emily >>> >>> On Thursday, October 22, 2020 at 3:09:29 PM UTC-4 David Shteynberg wrote: >>> >>>> Hi Emily, >>>> >>>> I analyzed the search results that you sent and I am seeing some >>>> strange things in at least one of the files you gave me. This may be >>>> causing some of the problems you saw. >>>> In file 03CPTAC_OVprospective_W_PNNL_20161212_B1S3_f13.pepXML on line >>>> 171821 there are some strange characters (possibly binary) that are >>>> tripping up the TPP. I think these might be caused by a bug in an >>>> analysis >>>> tool upstream of the TPP. Not sure if there are other mistakes of this >>>> sort. Also I found some 'U' amino acids in the database which the TPP >>>> complains about having a mass of 0. >>>> >>>> I hope this helps you somewhat. Let me know what you find on your end. >>>> >>>> Cheers, >>>> -David >>>> >>>> On Tue, Oct 20, 2020 at 1:42 PM Emily Kawaler <[email protected]> >>>> wrote: >>>> >>>>> Sure! The spectra are from the CPTAC2 ovarian propective dataset, >>>>> though I removed all scans that matched to a standard reference database >>>>> (I >>>>> don't think the scan removal is the issue, since I'm also having this >>>>> problem on a different dataset without removing any scans; I also checked >>>>> with xmllint and it looks like the mzML pepXML files are valid). I've >>>>> been >>>>> running it with the philosopher pipeline, so the pepXML files were >>>>> generated with MSFragger as part of that pipeline. The database is a >>>>> customized variant database with contaminants and decoys added by >>>>> philosopher's database tool. Are there any other specifics you'd like? I >>>>> can upload my full philosopher.yml file if that would be helpful. >>>>> >>>>> On Tuesday, October 20, 2020 at 1:30:44 AM UTC-4 David Shteynberg >>>>> wrote: >>>>> >>>>>> Hi Emily, >>>>>> >>>>>> I got the data and now I am trying to understand how you are running >>>>>> the analysis. Can you please describe those steps? >>>>>> >>>>>> Thank you, >>>>>> -David >>>>>> >>>>>> On Sat, Oct 17, 2020 at 12:54 PM Emily Kawaler <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I've uploaded the pepXML files, the parameters I used, and the >>>>>>> database here. >>>>>>> <https://drive.google.com/drive/folders/1gJoi9fqsmIYg_0tl_2Ur-n04MJyuotyc?usp=sharing> >>>>>>> Please let me know if I should be uploading anything else! Thank you! >>>>>>> >>>>>>> On Saturday, October 17, 2020 at 12:04:21 AM UTC-4 Emily Kawaler >>>>>>> wrote: >>>>>>> >>>>>>>> Thank you! I'm working on getting it transferred to Drive, so it >>>>>>>> might take a little while, but I'll be in touch! >>>>>>>> >>>>>>>> On Tuesday, October 13, 2020 at 3:08:44 PM UTC-4 David Shteynberg >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello Emily, >>>>>>>>> >>>>>>>>> If you are able to share the dataset including the pepXML file and >>>>>>>>> the database I can try to replicate the issue here and try to >>>>>>>>> troubleshoot >>>>>>>>> the sticking point. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -David >>>>>>>>> >>>>>>>>> On Tue, Oct 13, 2020 at 11:15 AM Emily Kawaler <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello, and thank you for your response! It doesn't look like the >>>>>>>>>> process is using too much memory (I've allocated 300 GB and it's >>>>>>>>>> maxing out >>>>>>>>>> around 10), and I've kicked up the minprob parameter - it's still >>>>>>>>>> getting >>>>>>>>>> stuck, unfortunately. >>>>>>>>>> Emily >>>>>>>>>> >>>>>>>>>> On Friday, October 9, 2020 at 2:24:37 PM UTC-4 Luis wrote: >>>>>>>>>> >>>>>>>>>>> Hello Emily, >>>>>>>>>>> >>>>>>>>>>> This is not a problem that we have seen much of. Do you know >>>>>>>>>>> which version of ProteinProphet / TPP you are using? >>>>>>>>>>> >>>>>>>>>>> One potential issue is the large number of proteins (and >>>>>>>>>>> peptides) that it is trying to process -- can you either monitor >>>>>>>>>>> the memory >>>>>>>>>>> usage of the machine when you run this dataset, and/or try on one >>>>>>>>>>> with more >>>>>>>>>>> memory? >>>>>>>>>>> >>>>>>>>>>> Hope this helps, >>>>>>>>>>> --Luis >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 6, 2020 at 6:32 PM Emily Kawaler <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello! I've been running ProteinProphet as part of the >>>>>>>>>>>> Philosopher pipeline for a while now with no problems. However, >>>>>>>>>>>> one of my >>>>>>>>>>>> datasets seems to be getting stuck in the middle of this function. >>>>>>>>>>>> It >>>>>>>>>>>> doesn't throw an error or anything - just stops advancing (the >>>>>>>>>>>> last >>>>>>>>>>>> line of the output is "Computing degenerate peptides for 69919 >>>>>>>>>>>> proteins: 0%...10%...20%...30%...40%...50%"). Has anyone run into >>>>>>>>>>>> this >>>>>>>>>>>> problem before? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "spctools-discuss" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/be33a8fb-a6ec-41b6-a988-981161f194fcn%40googlegroups.com >>>>>>>>>>>> >>>>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/be33a8fb-a6ec-41b6-a988-981161f194fcn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "spctools-discuss" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> >>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/6d28e150-40f0-4747-a8a3-02630b12379dn%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/6d28e150-40f0-4747-a8a3-02630b12379dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "spctools-discuss" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/spctools-discuss/de634f4a-0057-4fc1-b135-e639c0eb77een%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/de634f4a-0057-4fc1-b135-e639c0eb77een%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "spctools-discuss" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/spctools-discuss/9c0b1f62-81a7-417b-9e8f-14900f87e134n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/spctools-discuss/9c0b1f62-81a7-417b-9e8f-14900f87e134n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "spctools-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/spctools-discuss/8a49c6ac-a508-4f34-9369-53d0d6b503afn%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/spctools-discuss/8a49c6ac-a508-4f34-9369-53d0d6b503afn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/91ee8045-1e02-4dab-8861-2e247769673fn%40googlegroups.com.
