Yeah, those ought to be off_t instead of int (or even long int). Actually use ramp_fileoffset_t if you can, it's set up for crossplatform compiles.
On Wed, Mar 31, 2010 at 2:46 PM, Dave Trudgian <[email protected]> wrote: > David, > > Have checked the .fasta for offending < > " characters, and there aren't > any present. As per previous email, a separate expat based parsing script to > check that the file is well-formed runs over it fine, so am fairly confident > there aren't any problems in the xml file. The nonsense offsets in the > .pep.xml.index file look the most obvious things to cause a hiccup. > > Brian, > > I just noticed that in PepXMLViewer/XMLNode.h offsets are defined as > follows: > > int startOffset_; > int endOffset_; > > .. and in PepXNode.h as: > > int startOffset; > int endOffset; > > I guess that this could be where the overflow is coming from. Most else > uses 'long' types for offsets, which should be 64-bit when compiled using > g++ on 64-bit linux, but I believe that 'int' types will still be 32-bit. > > Also, the jumpParse method in PepXSAXHandler.cxx uses an int offset: > > void SAXHandler::jumpParse(int offset) { > > Getting late here, but tomorrow / this weekend I'll try changing int to > long on these remaining int offsets in the code and see if it does anything. > > DT > > > > > > On 31/03/2010 22:02, David Shteynberg wrote: > >> The problem could be caused by a bad character like " appearing in one >> of your protein descriptions in the database and breaking the XML >> parsing. Can you search your fasta database for occurences of " ? >> >> -David >> >> On Wed, Mar 31, 2010 at 1:33 PM, Brian Pratt<[email protected]> >> wrote: >> >> >>> Huh, that's all supposed to just work, assuming you have >>> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE >>> in your gcc compilation command. >>> >>> Although you'd think that would be moot in a 64 bit world. >>> >>> On Wed, Mar 31, 2010 at 1:20 PM, Dave Trudgian<[email protected]> >>> wrote: >>> >>> >>>> Brian, >>>> >>>> It was built from 4.3.1 source on Ubuntu Server 9.04 64-bit, using gcc >>>> 4.3.3-5ubuntu4. >>>> >>>> As mentioned, I think the problem is due to the index file... in the >>>> interact.pep.xml.index I have lines like: >>>> >>>> -2069230164 -2069226665 48 EIAIIPSKKLR >>>> EIAIIPSK134.11K134.11LRI PI00622165 0 9.19630.19630.3 >>>> 0.0000 >>>> >>>> ... where the first two values are offsets in the .pep.xml for that >>>> peptide, which have overflowed into signed 32-bit int 2s complement >>>> negative >>>> values. >>>> >>>> I believe from looking at the code that when the index file exists then >>>> the PepXMLViewer will use it rather than doing a full expat parse of the >>>> file. Hence it's ending up with nonsense offsets for peptide information >>>> which are likely causing the errors. I'll have a look tomorrow at where >>>> the >>>> index is created, and what integer types are being used. I suppose an >>>> unsigned int can be used then that'd be good for up to 4GB, and long int >>>> would give 4GB on 32-bit or much more on 64-bit systems. >>>> >>>> DT >>>> >>>> >>>> >>>> On 31/03/2010 18:36, Brian Pratt wrote: >>>> >>>> Do you know how pepXMLViewer.cgi was built? It's meant to support large >>>> files... >>>> >>>> On Wed, Mar 31, 2010 at 9:54 AM, dctrud<[email protected]> wrote: >>>> >>>> >>>>> Hi Brian, >>>>> >>>>> I thought about out of memory conditions, but am running on 64-bit >>>>> linux, and have 32GB of RAM, plus whilst running the cgi is using only >>>>> a very small fraction of that. >>>>> >>>>> Looked again and the file is *just* over the 2GB boundary, looks like >>>>> you're right, which has pointed me to the index file, which shows the >>>>> integer offset values have overflowed. >>>>> >>>>> Many Thanks, >>>>> >>>>> DT >>>>> >>>>> >>>>> On 31 Mar, 17:08, Brian Pratt<[email protected]> wrote: >>>>> >>>>> >>>>>> My guess would be that the parser is trying to fail gracefully on an >>>>>> out of >>>>>> memory condition - it "forgets" part of the stream then is confused >>>>>> when it >>>>>> hits an unmatched closing tag. >>>>>> >>>>>> But that's just a guess. Could also be about crossing the dread 2GB >>>>>> file >>>>>> size threshold. >>>>>> >>>>>> It's almost certainly about largeness, though. >>>>>> Brian >>>>>> >>>>>> On Wed, Mar 31, 2010 at 6:38 AM, dctrud<[email protected]> wrote: >>>>>> >>>>>> >>>>>>> All, >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> I'm having trouble with PepXMLViewer.cgi (4.3.1) on some very >>>>>>> large .pep.xml files. The cgi will exit with the error: >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> error with spreadsheet printing: XML parsing error: not well-formed >>>>>>> (invalid token), at xml file line 6298020, column 17 >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> This is for an export to Excel, but similar errors will also occur >>>>>>> when filtering the dataset in the web interface. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> I've checked that the interact.pep.xml file is well formed with a >>>>>>> python script that uses expat to parse it (as per the cgi), and there >>>>>>> are no problems. Line 6298020 is the following end tag, which isn't >>>>>>> an >>>>>>> invalid token: >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> </modification_info> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> I've also checked that none of the protein descriptions in the file >>>>>>> contain< > " characters which could mess up the parsing earlier. Am >>>>>>> now out of ideas of what could be the cause, and wondering if anyone >>>>>>> has seen this problem, or has any ideas? >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> Many Thanks, >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> DT >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups >>>>>>> "spctools-discuss" group. >>>>>>> To post to this group, send email to >>>>>>> [email protected]. >>>>>>> To unsubscribe from this group, send email to >>>>>>> >>>>>>> [email protected]<spctools-discuss%[email protected]> >>>>>>> <spctools-discuss%[email protected]<spctools-discuss%[email protected]> >>>>>>> > >>>>>>> . >>>>>>> For more options, visit this group at >>>>>>> http://groups.google.com/group/spctools-discuss?hl=en. >>>>>>> >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups >>>>> "spctools-discuss" group. >>>>> To post to this group, send email to [email protected] >>>>> . >>>>> To unsubscribe from this group, send email to >>>>> [email protected]<spctools-discuss%[email protected]> >>>>> . >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/spctools-discuss?hl=en. >>>>> >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups >>>> "spctools-discuss" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]<spctools-discuss%[email protected]> >>>> . >>>> For more options, visit this group at >>>> http://groups.google.com/group/spctools-discuss?hl=en. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups >>>> "spctools-discuss" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]<spctools-discuss%[email protected]> >>>> . >>>> For more options, visit this group at >>>> http://groups.google.com/group/spctools-discuss?hl=en. >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "spctools-discuss" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]<spctools-discuss%[email protected]> >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/spctools-discuss?hl=en. >>> >>> >>> >> >> > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
