It depends what you mean by "quickly". It's quite difficult to efficiently insert bytes into a file anywhere except at the end... and of course that kind of efficiency is nice to have when dealing with multi-gigabyte files. I doubt any (non-commercial?) proteomics tool goes to the length of doing protein accession id manipulation optimally: http://www.codeproject.com/KB/files/enhancedfs.aspx http://stackoverflow.com/questions/724998/efficient-in-line-search-and-replace-for-large-file
DecoyFASTA's -no_reverse mode writes a new file with the original sequences and the ids adjusted; that's probably the best you're going to get. Make sure you've got 2 gigs free. :) -Matt Brian Pratt wrote: > Have a look at the decoyfasta tool that ships with TPP. > > On Fri, Sep 18, 2009 at 10:21 AM, rhodea <[email protected] > <mailto:[email protected]>> wrote: > > > Dear friends, > > I have a large protein database (2G) in fasta format. I want to use it > as decoy database during protein identification and append it to a > target database. So I need to modify the ID name in this fasta file. > How should I do it quickly in batch? Is there any software that can > satisfy this aim? > > Sincerely, > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
