[spctools-discuss] X!Tandem FASTA long header lines crash

[email protected] Mon, 17 Oct 2011 10:00:04 -0700

Hi all,

I've come across problems running X!Tandem searches against NCBI nr
derived FASTA files due to some entries having extremely long header
lines.


Current code in msequenceserver.cpp is:

166     // 2006.11.21 - increased the size from 32*4096 to 512*1024
because of very long lines in nr FASTA files
167     m_lSize = 512*1024-1;

The longest line in a human extract from nr is now above 512KB:

[manager@proteomics-srv-01 ncbi_nr]$ cat decoy_nr_homo_sapiens.fasta |
awk '{print length}'|sort -nr|head -1
684322

... so that's ~684KB, exceeding the 512KB in msequencesserver.cpp.
This is causing a segfault when searching against nr.

I've increased to m_lSize = 1024*1024-1 and am just wondering whether
this should be done in the TPP distribution, as presumably many people
could come up against this segfault issue if they use nr. Of course,
1MB for a FASTA sequence header is a bit ridiculous!

Cheers,

DT

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

[spctools-discuss] X!Tandem FASTA long header lines crash

Reply via email to