Thank you very much for your advices.
Claus Ibsen-2 wrote: > > On Tue, Mar 23, 2010 at 8:24 PM, Justinson <justin...@googlemail.com> > wrote: >> >> Unfortunately I'm getting an OutOfMemoryError using XPath splitting the >> way >> you shown. I'm parsing a file with about 500000 xml messages. > > You could pre process the big file and split it into X files. > Maybe by using the java.util.Scanner to identify "good places" to > split the big file. > I'm just trying to handle the "format stack" properly: It's a byte stream in the base layer but an XML stream in the second layer. In my case the byte stream has no own structure so I cannot split it. Therefore I'd try to apply your second advice using XML-aware parsing. Claus Ibsen-2 wrote: > > > Or you could try using SAX based XML parsing when splitting to reduce > the memory overhead. > Just use a Bean for that. Something like this: > > public Iterator splitBigFile(java.io.File file) { > // SAX parsing the big file and return an iterator or something that > can walk the XML messages you like > } > > And use the bean with the Camel Split EIP > > How it possible to integrate a "push" parser paradigm more smoothly into Camel than hinding it behind an iterator? (For iterator-based XML splitting, using StAX "pull" XML parsing is probably a more proper choice.) Claus Ibsen-2 wrote: > > >> How can we use Apache Digester instead? > > The Commons Digester supports a XPath-like pattern-matching syntax and uses SAX behind the scenes. It also exibits the "push" paradigm of SAX but introduces a stack concept for match results. Thats why a stream-like handling is supported. Unfortunately Camel does not have a support for Digester at the moment. Another idea: Would you recommend using of Xstream for this task? Claus Ibsen-2 wrote: > > >> Claus Ibsen-2 wrote: >>> >>> Hi >>> >>> This is as far I got with the xpath expression for splitting >>> http://svn.apache.org/viewvc?rev=825156&view=rev > > -- View this message in context: http://old.nabble.com/handling-large-files-tp25826380p28038839.html Sent from the Camel - Users mailing list archive at Nabble.com.