[spctools-discuss] Re: Database SEQUEST search SLOW!

bwv549 Fri, 24 Jul 2009 10:09:28 -0700

To speak for mspire, the scan numbers between srf, mzXML, and pepXML
will (should) all match up.  In my experience, sequest is much faster
when creating the srf files instead of .dta/.out files.  The
conversion from srf to pepXML is reasonably fast, too.  I don't have
any experience in comparing it with the speed of the database search
in tpp (does tpp come with comet now?).


Some info on where we are at as it relates to this discussion:
Currently, the conversion from srf to pepXML must be done with version
0.4.7:

    % sudo gem install mspire -v 0.4.7
    % bioworks_to_pepxml.rb <file>.srf ...

 We're trying to better modularize mspire, so I have a slightly more
robust srf reader in the ms-sequest package [the old reader gracefully
declined to read if print_duplicate_references=0 in the embedded
params file; the new reader gives a warning message but is fine with
it].  Right now I have utilities to output .dta files, .mgf files,
and .sqt files, but not conversion to pepXML yet, but hopefully soon:

    % sudo gem install ms-sequest
    % srf_to_search.rb <file>.srf ...
    % srf_to_sqt.rb <file>.srf ...
    % srf_to_pepxml.rb <file>.srf ...  # soon

--John


On Jul 20, 2:58 am, Ali <[email protected]> wrote:
> Natalie
>
> Yes, I am doing an in house sequest search.  I generate the .srf files
> from .raw files and then use the .srf files to convert them to pep.xml
> using mspire. This takes much less time than using the pipeline.  Good
> point about the scan numbers in mzXML files and pep.xml....I will make
> sure they are the same.
>
> I will be giving all the search algorithms a try soon...
>
> Ali
>
> On Jul 17, 6:18 pm, Natalie Tasman <[email protected]>
> wrote:
>
> > Ah, ok, I see that mspire takes Thermo .srf files, which I believe are
> > the newer Sequest search result files
> > (http://mspire.rubyforge.org/tutorial/search_precision/prophet.html).
> > Since this method hasn't been validated by us, you'll want to be sure
> > that the scan numbers in the mzXML files still refer to the scan
> > numbers in the mspire-converted pep.xml.  So in your method you *are*
> > doing a search step, and are still using Sequest.
>
> > You might want to give X!Tandem a try!
>
> > Natalie
>
> > On Fri, Jul 17, 2009 at 10:10 AM, Natalie
>
> > Tasman<[email protected]> wrote:
> > > Ok. Just to let you know, you will need to do *some* search step in
> > > order to make use of the TPP.  mspire is a completely independent
> > > project and is not supported by the TPP (although looking at their
> > > website, it seems that they try to be compatible at least in formats).
> > >  So I think you're missing the peptide ID phase, which is the info
> > > that .pep.xml files should contain.  Feel free to educate me about
> > > about mspire if I'm wrong.
>
> > > On Fri, Jul 17, 2009 at 6:54 AM, Ali<[email protected]> wrote:
>
> > >> Hi Natalie
>
> > >> I used sequest.  I've sort of given up on that.  I now convert my .raw
> > >> files directly to .pep.xml files using mspire.  Its not much of a
> > >> pipeline anymore but it does the job.
>
> > >> Regards
>
> > >> Ali
>
> > >> On Jul 15, 9:19 pm, Natalie Tasman <[email protected]>
> > >> wrote:
> > >>> Hi Ali,
>
> > >>> Greg made some good comments on your post regarding the Sequest search
> > >>> engine.  I'm curious as to what search engine you're using.  The TPP
> > >>> includes X!Tandem, which is generally significantly faster than 
> > >>> Sequest, and
> > >>> is multi-threaded (and free, so you can even consider running it across 
> > >>> a
> > >>> cluster-- something we're hoping to make easier in the future).  Note,
> > >>> though, all search engines, including X!Tandem, are affected by the 
> > >>> search
> > >>> parameters that you use.  So a semi-tryptic  search will generally 
> > >>> always
> > >>> take longer than a full-tryptic search but increase the search space 
> > >>> (which
> > >>> is good), so it's always going to be a trade-off.  We include some 
> > >>> default
> > >>> X!Tandem parameter files; as you get more experience, you might want to 
> > >>> play
> > >>> with modifying some of the parameters to optimize your search.
>
> > >>> Natalie
>
> > >>> On Fri, Jul 10, 2009 at 6:59 AM, Greg Bowersock <[email protected]> 
> > >>> wrote:
> > >>> > Sequest is not very fast. The only way to really speed up sequest is 
> > >>> > to
> > >>> > give it more processors, provided you are using sequest in a way that 
> > >>> > will
> > >>> > allow you to use the extra processors. There are many factors that 
> > >>> > affect
> > >>> > the speed of processing though, with the two largest being the type of
> > >>> > digestion and the size of the database. The number of modifications 
> > >>> > also
> > >>> > plays a role in the amount of time, so as you can see there isn't any 
> > >>> > one
> > >>> > way to really speed up sequest. Also, 10 hours isn't all that bad, 
> > >>> > try doing
> > >>> > a no-enzyme search on a decent sized database, that can take days on 
> > >>> > 1-2
> > >>> > processors.
>
> > >>> > On Fri, Jul 10, 2009 at 4:42 AM, Ali <[email protected]> 
> > >>> > wrote:
>
> > >>> >> Hi everyone
>
> > >>> >> I am doing some databse search with mzXML files generated from
> > >>> >> Thermo .raw files against my database.  I have a duo core 2GB RAM
> > >>> >> machine.  I understand that TPP does not use multi-threading but one
> > >>> >> mzXML file seems to take 10 hours to process!!  Is there anyway I can
> > >>> >> speed up this process?
>
> > >>> >> Regards
>
> > >>> >> Ali
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

[spctools-discuss] Re: Database SEQUEST search SLOW!

Reply via email to