Brian,
It was built from 4.3.1 source on Ubuntu Server 9.04 64-bit, using gcc
4.3.3-5ubuntu4.
As mentioned, I think the problem is due to the index file... in the
interact.pep.xml.index I have lines like:
-2069230164 -2069226665 48 EIAIIPSKKLR
EIAIIPSK134.11K134.11LRI PI00622165 0 9.19630.19630.3 0.0000
... where the first two values are offsets in the .pep.xml for that
peptide, which have overflowed into signed 32-bit int 2s complement
negative values.
I believe from looking at the code that when the index file exists then
the PepXMLViewer will use it rather than doing a full expat parse of the
file. Hence it's ending up with nonsense offsets for peptide information
which are likely causing the errors. I'll have a look tomorrow at where
the index is created, and what integer types are being used. I suppose
an unsigned int can be used then that'd be good for up to 4GB, and long
int would give 4GB on 32-bit or much more on 64-bit systems.
DT
On 31/03/2010 18:36, Brian Pratt wrote:
Do you know how pepXMLViewer.cgi was built? It's meant to support
large files...
On Wed, Mar 31, 2010 at 9:54 AM, dctrud <[email protected]
<mailto:[email protected]>> wrote:
Hi Brian,
I thought about out of memory conditions, but am running on 64-bit
linux, and have 32GB of RAM, plus whilst running the cgi is using only
a very small fraction of that.
Looked again and the file is *just* over the 2GB boundary, looks like
you're right, which has pointed me to the index file, which shows the
integer offset values have overflowed.
Many Thanks,
DT
On 31 Mar, 17:08, Brian Pratt <[email protected]
<mailto:[email protected]>> wrote:
> My guess would be that the parser is trying to fail gracefully
on an out of
> memory condition - it "forgets" part of the stream then is
confused when it
> hits an unmatched closing tag.
>
> But that's just a guess. Could also be about crossing the dread
2GB file
> size threshold.
>
> It's almost certainly about largeness, though.
> Brian
>
> On Wed, Mar 31, 2010 at 6:38 AM, dctrud <[email protected]
<mailto:[email protected]>> wrote:
> > All,
>
> > I'm having trouble with PepXMLViewer.cgi (4.3.1) on some very
> > large .pep.xml files. The cgi will exit with the error:
>
> > error with spreadsheet printing: XML parsing error: not
well-formed
> > (invalid token), at xml file line 6298020, column 17
>
> > This is for an export to Excel, but similar errors will also occur
> > when filtering the dataset in the web interface.
>
> > I've checked that the interact.pep.xml file is well formed with a
> > python script that uses expat to parse it (as per the cgi),
and there
> > are no problems. Line 6298020 is the following end tag, which
isn't an
> > invalid token:
>
> > </modification_info>
>
> > I've also checked that none of the protein descriptions in the
file
> > contain < > " characters which could mess up the parsing
earlier. Am
> > now out of ideas of what could be the cause, and wondering if
anyone
> > has seen this problem, or has any ideas?
>
> > Many Thanks,
>
> > DT
>
> > --
> > You received this message because you are subscribed to the
Google Groups
> > "spctools-discuss" group.
> > To post to this group, send email to
[email protected]
<mailto:[email protected]>.
> > To unsubscribe from this group, send email to
> > [email protected]
<mailto:spctools-discuss%[email protected]><spctools-discuss%[email protected]
<mailto:spctools-discuss%[email protected]>>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/spctools-discuss?hl=en.
--
You received this message because you are subscribed to the Google
Groups "spctools-discuss" group.
To post to this group, send email to
[email protected]
<mailto:[email protected]>.
To unsubscribe from this group, send email to
[email protected]
<mailto:spctools-discuss%[email protected]>.
For more options, visit this group at
http://groups.google.com/group/spctools-discuss?hl=en.
--
You received this message because you are subscribed to the Google
Groups "spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/spctools-discuss?hl=en.
--
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/spctools-discuss?hl=en.