The problem could be caused by a bad character like " appearing in one of your protein descriptions in the database and breaking the XML parsing. Can you search your fasta database for occurences of " ?
-David On Wed, Mar 31, 2010 at 1:33 PM, Brian Pratt <[email protected]> wrote: > Huh, that's all supposed to just work, assuming you have > -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE > in your gcc compilation command. > > Although you'd think that would be moot in a 64 bit world. > > On Wed, Mar 31, 2010 at 1:20 PM, Dave Trudgian <[email protected]> wrote: >> >> Brian, >> >> It was built from 4.3.1 source on Ubuntu Server 9.04 64-bit, using gcc >> 4.3.3-5ubuntu4. >> >> As mentioned, I think the problem is due to the index file... in the >> interact.pep.xml.index I have lines like: >> >> -2069230164 -2069226665 48 EIAIIPSKKLR >> EIAIIPSK134.11K134.11LRI PI00622165 0 9.19630.19630.3 0.0000 >> >> ... where the first two values are offsets in the .pep.xml for that >> peptide, which have overflowed into signed 32-bit int 2s complement negative >> values. >> >> I believe from looking at the code that when the index file exists then >> the PepXMLViewer will use it rather than doing a full expat parse of the >> file. Hence it's ending up with nonsense offsets for peptide information >> which are likely causing the errors. I'll have a look tomorrow at where the >> index is created, and what integer types are being used. I suppose an >> unsigned int can be used then that'd be good for up to 4GB, and long int >> would give 4GB on 32-bit or much more on 64-bit systems. >> >> DT >> >> >> >> On 31/03/2010 18:36, Brian Pratt wrote: >> >> Do you know how pepXMLViewer.cgi was built? It's meant to support large >> files... >> >> On Wed, Mar 31, 2010 at 9:54 AM, dctrud <[email protected]> wrote: >>> >>> Hi Brian, >>> >>> I thought about out of memory conditions, but am running on 64-bit >>> linux, and have 32GB of RAM, plus whilst running the cgi is using only >>> a very small fraction of that. >>> >>> Looked again and the file is *just* over the 2GB boundary, looks like >>> you're right, which has pointed me to the index file, which shows the >>> integer offset values have overflowed. >>> >>> Many Thanks, >>> >>> DT >>> >>> >>> On 31 Mar, 17:08, Brian Pratt <[email protected]> wrote: >>> > My guess would be that the parser is trying to fail gracefully on an >>> > out of >>> > memory condition - it "forgets" part of the stream then is confused >>> > when it >>> > hits an unmatched closing tag. >>> > >>> > But that's just a guess. Could also be about crossing the dread 2GB >>> > file >>> > size threshold. >>> > >>> > It's almost certainly about largeness, though. >>> > Brian >>> > >>> > On Wed, Mar 31, 2010 at 6:38 AM, dctrud <[email protected]> wrote: >>> > > All, >>> > >>> > > I'm having trouble with PepXMLViewer.cgi (4.3.1) on some very >>> > > large .pep.xml files. The cgi will exit with the error: >>> > >>> > > error with spreadsheet printing: XML parsing error: not well-formed >>> > > (invalid token), at xml file line 6298020, column 17 >>> > >>> > > This is for an export to Excel, but similar errors will also occur >>> > > when filtering the dataset in the web interface. >>> > >>> > > I've checked that the interact.pep.xml file is well formed with a >>> > > python script that uses expat to parse it (as per the cgi), and there >>> > > are no problems. Line 6298020 is the following end tag, which isn't >>> > > an >>> > > invalid token: >>> > >>> > > </modification_info> >>> > >>> > > I've also checked that none of the protein descriptions in the file >>> > > contain < > " characters which could mess up the parsing earlier. Am >>> > > now out of ideas of what could be the cause, and wondering if anyone >>> > > has seen this problem, or has any ideas? >>> > >>> > > Many Thanks, >>> > >>> > > DT >>> > >>> > > -- >>> > > You received this message because you are subscribed to the Google >>> > > Groups >>> > > "spctools-discuss" group. >>> > > To post to this group, send email to >>> > > [email protected]. >>> > > To unsubscribe from this group, send email to >>> > > >>> > > [email protected]<spctools-discuss%[email protected]> >>> > > . >>> > > For more options, visit this group at >>> > >http://groups.google.com/group/spctools-discuss?hl=en. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "spctools-discuss" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/spctools-discuss?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "spctools-discuss" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/spctools-discuss?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "spctools-discuss" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/spctools-discuss?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
