The constraints are in the XSD, not in the XML. Do your users also supply the schema?

On Mon, 2004-09-13 at 15:50, Thomas Cox wrote:
We don't have control over the incoming XML - we're at the mercy of the end-users. We tried turning off as many features as possible, but it just postponed the issue,
 
- Thomas

From: Phil Weighill-Smith [mailto:[EMAIL PROTECTED]
Sent: Monday, September 13, 2004 10:43 AM
To: [EMAIL PROTECTED]
Subject: RE: Processing speed slow down for large files



This could relate to having things like keyref/key/unique constraints (which xerces handles by holding lists of element info in memory)... If you have this sort of validation, try removing it and see how things go...

Phil :n.

On Mon, 2004-09-13 at 15:34, Thomas Cox wrote:
We saw the same thing. In our case, the culprit is the JVM, GC, and memory management. At some threshold (which varies by JVM vendor, JVM settings, etc.) it starts thrashing, and performance degrades almost immediately to uselessness. We have not found a workaround other than to write a key part of our system (the one that absolutely MUST scale) in plain old low-level C, using primitive non-validating parsing. 

We used JProbe and dumps from the JVM ( -Xloggc, -Xprof) to help pin it down.

YMMV.

Thomas

-----Original Message-----
From: Conley, Daryl [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 13, 2004 9:55 AM
To: Xerces-J-User (E-mail)
Subject: Processing speed slow down for large files

Hello, 

   We have an application that processes from very small to very large XML files.  Up to this point we have not had two many issues with the SAX parser in Xerces. Unfortunately we are going to need to process very large files, so I have been running some tests to see how the performance is.  I have found that I can parse a file up to about 1.15 Gig with a quarter of the records with invalid data (to simulate the bad data we will be getting) and it processes in about 1.5 hours, a file with the same error rate but 1.2 Gig in size takes 27.5 hours.  These are running under WebLogic on a Sun dual processor server.  I have tested the same file but our process did not output anything because I was testing to see if the IO was slowing the
process but it still took 27.5 hours.   What could be the cause of such an
abrupt change in processing time??  Is there any tools that I could use to see what is going on under the hood?  By the way a clean file would take about an hour to process.

Any help would be appreciated.

Thanks

Daryl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Phil Weighill-Smith <[EMAIL PROTECTED]>
Volantis Systems
--
Phil Weighill-Smith <[EMAIL PROTECTED]>
Volantis Systems

Reply via email to