Thank you very much for your advices.

Claus Ibsen-2 wrote:
> 
> On Tue, Mar 23, 2010 at 8:24 PM, Justinson <justin...@googlemail.com>
> wrote:
>>
>> Unfortunately I'm getting an OutOfMemoryError using XPath splitting the
>> way
>> you shown. I'm parsing a file with about 500000 xml messages.
> 
> You could pre process the big file and split it into X files.
> Maybe by using the java.util.Scanner to identify "good places" to
> split the big file.
> 

I'm just trying to handle the "format stack" properly: It's a byte stream in
the base layer but an XML stream in the second layer. In my case the byte
stream has no own structure so I cannot split it. Therefore I'd try to apply
your second advice using XML-aware parsing.


Claus Ibsen-2 wrote:
> 
> 
> Or you could try using SAX based XML parsing when splitting to reduce
> the memory overhead.
> Just use a Bean for that. Something like this:
> 
> public Iterator splitBigFile(java.io.File file) {
>   // SAX parsing the big file and return an iterator or something that
> can walk the XML messages you like
> }
> 
> And use the bean with the Camel Split EIP
> 
> 

How it possible to integrate a "push" parser paradigm more smoothly into
Camel than hinding it behind an iterator?

(For iterator-based XML splitting, using StAX "pull" XML parsing is probably
a more proper choice.)


Claus Ibsen-2 wrote:
> 
> 
>> How can we use Apache Digester instead?
> 
> 

The Commons Digester supports a XPath-like pattern-matching syntax and uses
SAX behind the scenes. It also exibits the "push" paradigm of SAX but
introduces a stack concept for match results. Thats why a stream-like
handling is supported. Unfortunately Camel does not have a support for
Digester at the moment.

Another idea: Would you recommend using of Xstream for this task?


Claus Ibsen-2 wrote:
> 
> 
>> Claus Ibsen-2 wrote:
>>>
>>> Hi
>>>
>>> This is as far I got with the xpath expression for splitting
>>> http://svn.apache.org/viewvc?rev=825156&view=rev
> 
> 
-- 
View this message in context: 
http://old.nabble.com/handling-large-files-tp25826380p28038839.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to