Re: [spctools-discuss] Re: ASAPRatioProteinRatioParser Performance

2010-02-05 Thread Dave Trudgian
For info, I've modified the .pep.xml parsing into a single step, and on a very small test dataset ASAPRatioProteinRatioParser run-time is down from 19.8s to 0.8s. I'd expect the speed-up on a large dataset to be higher due to the compound effects of more protein groups + bigger .pep.xml. Will

Re: [spctools-discuss] Re: ASAPRatioProteinRatioParser Performance

2010-02-04 Thread Brian Pratt
Looking at the code I can see where this would easily become diskbound for large data sets - it reads and rereads the same pepXML files repeatedly, but the effect is probably masked by disk cacheing up to a certain point. Somebody would need to write some logic for cacheing the file contents to

Re: [spctools-discuss] Re: ASAPRatioProteinRatioParser Performance

2010-02-04 Thread Dave Trudgian
Brian, Yup. I just discovered this too, as per other post. On our servers it's not disk-bound, as the 1.6GB .pep.xml is fully cached, but the continued rpeated slows things down. R.E. solutions for this, is Boost code welcomed in the main TPP tools? I think I can re-write using a single

Re: [spctools-discuss] Re: ASAPRatioProteinRatioParser Performance

2010-02-04 Thread Natalie Tasman
Hi Dave, Brian, Just jumping into comment on Boost. Yes, it is welcomed and in fact already used in the TPP (as well as the TPP-included ProteoWizard project); however, because the Boost API and process of building Boost libraries have not been particularly stable, we've found it necessary

Re: [spctools-discuss] Re: ASAPRatioProteinRatioParser Performance

2010-02-04 Thread Brian Pratt
Nothing to add! Dave has the right idea for a performance fix, and Natalie is correct about TPP being addicted to Boost already. Brian On Thu, Feb 4, 2010 at 2:37 PM, Natalie Tasman natalie.tas...@insilicos.com wrote: Hi Dave, Brian, Just jumping into comment on Boost. Yes, it is welcomed