On Fri, Mar 26, 2010 at 9:15 AM, Justinson <justin...@googlemail.com> wrote: > > Thank you very much for your advices. > > > Claus Ibsen-2 wrote: >> >> On Tue, Mar 23, 2010 at 8:24 PM, Justinson <justin...@googlemail.com> >> wrote: >>> >>> Unfortunately I'm getting an OutOfMemoryError using XPath splitting the >>> way >>> you shown. I'm parsing a file with about 500000 xml messages. >> >> You could pre process the big file and split it into X files. >> Maybe by using the java.util.Scanner to identify "good places" to >> split the big file. >> > > I'm just trying to handle the "format stack" properly: It's a byte stream in > the base layer but an XML stream in the second layer. In my case the byte > stream has no own structure so I cannot split it. Therefore I'd try to apply > your second advice using XML-aware parsing. > > > Claus Ibsen-2 wrote: >> >> >> Or you could try using SAX based XML parsing when splitting to reduce >> the memory overhead. >> Just use a Bean for that. Something like this: >> >> public Iterator splitBigFile(java.io.File file) { >> // SAX parsing the big file and return an iterator or something that >> can walk the XML messages you like >> } >> >> And use the bean with the Camel Split EIP >> >> > > How it possible to integrate a "push" parser paradigm more smoothly into > Camel than hinding it behind an iterator? > > (For iterator-based XML splitting, using StAX "pull" XML parsing is probably > a more proper choice.) >
Try googling for a solution using XPath in Java as its what is used under the covers. It have a XPathFactory where you can set features and whatnot. I may offer ways to tweak how it should run in pull or push mode. And whether it offers to stream the result etc. > > Claus Ibsen-2 wrote: >> >> >>> How can we use Apache Digester instead? >> >> > > The Commons Digester supports a XPath-like pattern-matching syntax and uses > SAX behind the scenes. It also exibits the "push" paradigm of SAX but > introduces a stack concept for match results. Thats why a stream-like > handling is supported. Unfortunately Camel does not have a support for > Digester at the moment. > > Another idea: Would you recommend using of Xstream for this task? > > > Claus Ibsen-2 wrote: >> >> >>> Claus Ibsen-2 wrote: >>>> >>>> Hi >>>> >>>> This is as far I got with the xpath expression for splitting >>>> http://svn.apache.org/viewvc?rev=825156&view=rev >> >> > -- > View this message in context: > http://old.nabble.com/handling-large-files-tp25826380p28038839.html > Sent from the Camel - Users mailing list archive at Nabble.com. > > -- Claus Ibsen Apache Camel Committer Author of Camel in Action: http://www.manning.com/ibsen/ Open Source Integration: http://fusesource.com Blog: http://davsclaus.blogspot.com/ Twitter: http://twitter.com/davsclaus