Yeah, those ought to be off_t instead of int (or even long int).  Actually
use ramp_fileoffset_t if you can, it's set up for crossplatform compiles.

On Wed, Mar 31, 2010 at 2:46 PM, Dave Trudgian <[email protected]> wrote:

> David,
>
> Have checked the .fasta for offending < > " characters, and there aren't
> any present. As per previous email, a separate expat based parsing script to
> check that the file is well-formed runs over it fine, so am fairly confident
> there aren't any problems in the xml file. The nonsense offsets in the
> .pep.xml.index file look the most obvious things to cause a hiccup.
>
> Brian,
>
> I just noticed that in PepXMLViewer/XMLNode.h offsets are defined as
> follows:
>
> int startOffset_;
> int endOffset_;
>
> .. and in PepXNode.h as:
>
> int startOffset;
> int endOffset;
>
> I guess that this could be where the overflow is coming from. Most else
> uses 'long' types for offsets, which should be 64-bit when compiled using
> g++ on 64-bit linux, but I believe that 'int' types will still be 32-bit.
>
> Also, the jumpParse method in PepXSAXHandler.cxx uses an int offset:
>
> void SAXHandler::jumpParse(int offset) {
>
> Getting late here, but tomorrow / this weekend I'll try changing int to
> long on these remaining int offsets in the code and see if it does anything.
>
> DT
>
>
>
>
>
> On 31/03/2010 22:02, David Shteynberg wrote:
>
>> The problem could be caused by a bad character like " appearing in one
>> of your protein descriptions in the database and breaking the XML
>> parsing.  Can you search your fasta database for occurences of " ?
>>
>> -David
>>
>> On Wed, Mar 31, 2010 at 1:33 PM, Brian Pratt<[email protected]>
>>  wrote:
>>
>>
>>> Huh, that's all supposed to just work, assuming you have
>>> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
>>> in your gcc compilation command.
>>>
>>> Although you'd think that would be moot in a 64 bit world.
>>>
>>> On Wed, Mar 31, 2010 at 1:20 PM, Dave Trudgian<[email protected]>
>>>  wrote:
>>>
>>>
>>>> Brian,
>>>>
>>>> It was built from 4.3.1 source on Ubuntu Server 9.04 64-bit, using gcc
>>>> 4.3.3-5ubuntu4.
>>>>
>>>> As mentioned, I think the problem is due to the index file... in the
>>>> interact.pep.xml.index I have lines like:
>>>>
>>>> -2069230164     -2069226665     48      EIAIIPSKKLR
>>>> EIAIIPSK134.11K134.11LRI    PI00622165      0       9.19630.19630.3
>>>> 0.0000
>>>>
>>>> ... where the first two values are offsets in the .pep.xml for that
>>>> peptide, which have overflowed into signed 32-bit int 2s complement
>>>> negative
>>>> values.
>>>>
>>>> I believe from looking at the code that when the index file exists then
>>>> the PepXMLViewer will use it rather than doing a full expat parse of the
>>>> file. Hence it's ending up with nonsense offsets for peptide information
>>>> which are likely causing the errors. I'll have a look tomorrow at where
>>>> the
>>>> index is created, and what integer types are being used. I suppose an
>>>> unsigned int can be used then that'd be good for up to 4GB, and long int
>>>> would give 4GB on 32-bit or much more on 64-bit systems.
>>>>
>>>> DT
>>>>
>>>>
>>>>
>>>> On 31/03/2010 18:36, Brian Pratt wrote:
>>>>
>>>> Do you know how pepXMLViewer.cgi was built?  It's meant to support large
>>>> files...
>>>>
>>>> On Wed, Mar 31, 2010 at 9:54 AM, dctrud<[email protected]>  wrote:
>>>>
>>>>
>>>>> Hi Brian,
>>>>>
>>>>> I thought about out of memory conditions, but am running on 64-bit
>>>>> linux, and have 32GB of RAM, plus whilst running the cgi is using only
>>>>> a very small fraction of that.
>>>>>
>>>>> Looked again and the file is *just* over the 2GB boundary, looks like
>>>>> you're right, which has pointed me to the index file, which shows the
>>>>> integer offset values have overflowed.
>>>>>
>>>>> Many Thanks,
>>>>>
>>>>> DT
>>>>>
>>>>>
>>>>> On 31 Mar, 17:08, Brian Pratt<[email protected]>  wrote:
>>>>>
>>>>>
>>>>>> My guess would be that the parser is trying to fail gracefully on an
>>>>>> out of
>>>>>> memory condition - it "forgets" part of the stream then is confused
>>>>>> when it
>>>>>> hits an unmatched closing tag.
>>>>>>
>>>>>> But that's just a guess.  Could also be about crossing the dread 2GB
>>>>>> file
>>>>>> size threshold.
>>>>>>
>>>>>> It's almost certainly about largeness, though.
>>>>>> Brian
>>>>>>
>>>>>> On Wed, Mar 31, 2010 at 6:38 AM, dctrud<[email protected]>  wrote:
>>>>>>
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> I'm having trouble with PepXMLViewer.cgi (4.3.1) on some very
>>>>>>> large .pep.xml files. The cgi will exit with the error:
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> error with spreadsheet printing: XML parsing error: not well-formed
>>>>>>> (invalid token), at xml file line 6298020, column 17
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> This is for an export to Excel, but similar errors will also occur
>>>>>>> when filtering the dataset in the web interface.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> I've checked that the interact.pep.xml file is well formed with a
>>>>>>> python script that uses expat to parse it (as per the cgi), and there
>>>>>>> are no problems. Line 6298020 is the following end tag, which isn't
>>>>>>> an
>>>>>>> invalid token:
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> </modification_info>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> I've also checked that none of the protein descriptions in the file
>>>>>>> contain<  >  " characters which could mess up the parsing earlier. Am
>>>>>>> now out of ideas of what could be the cause, and wondering if anyone
>>>>>>> has seen this problem, or has any ideas?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> Many Thanks,
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> DT
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups
>>>>>>> "spctools-discuss" group.
>>>>>>> To post to this group, send email to
>>>>>>> [email protected].
>>>>>>> To unsubscribe from this group, send email to
>>>>>>>
>>>>>>> [email protected]<spctools-discuss%[email protected]>
>>>>>>> <spctools-discuss%[email protected]<spctools-discuss%[email protected]>
>>>>>>> >
>>>>>>> .
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/spctools-discuss?hl=en.
>>>>>>>
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups
>>>>> "spctools-discuss" group.
>>>>> To post to this group, send email to [email protected]
>>>>> .
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]<spctools-discuss%[email protected]>
>>>>> .
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/spctools-discuss?hl=en.
>>>>>
>>>>>
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "spctools-discuss" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected]<spctools-discuss%[email protected]>
>>>> .
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/spctools-discuss?hl=en.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "spctools-discuss" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected]<spctools-discuss%[email protected]>
>>>> .
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/spctools-discuss?hl=en.
>>>>
>>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "spctools-discuss" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]<spctools-discuss%[email protected]>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/spctools-discuss?hl=en.
>>>
>>>
>>>
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "spctools-discuss" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<spctools-discuss%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/spctools-discuss?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

Reply via email to