For info, I've modified the .pep.xml parsing into a single step, and on
a very small test dataset ASAPRatioProteinRatioParser run-time is down
from 19.8s to 0.8s. I'd expect the speed-up on a large dataset to be
higher due to the compound effects of more protein groups + bigger .pep.xml.
Will
Looking at the code I can see where this would easily become diskbound for
large data sets - it reads and rereads the same pepXML files repeatedly, but
the effect is probably masked by disk cacheing up to a certain point.
Somebody would need to write some logic for cacheing the file contents to
Brian,
Yup. I just discovered this too, as per other post. On our servers it's
not disk-bound, as the 1.6GB .pep.xml is fully cached, but the continued
rpeated slows things down.
R.E. solutions for this, is Boost code welcomed in the main TPP tools? I
think I can re-write using a single
Hi Dave, Brian,
Just jumping into comment on Boost. Yes, it is welcomed and in fact
already used in the TPP (as well as the TPP-included ProteoWizard
project); however, because the Boost API and process of building Boost
libraries have not been particularly stable, we've found it necessary
Nothing to add! Dave has the right idea for a performance fix, and Natalie
is correct about TPP being addicted to Boost already.
Brian
On Thu, Feb 4, 2010 at 2:37 PM, Natalie Tasman natalie.tas...@insilicos.com
wrote:
Hi Dave, Brian,
Just jumping into comment on Boost. Yes, it is welcomed