On Fri, Mar 26, 2010 at 9:15 AM, Justinson <justin...@googlemail.com> wrote:
>
> Thank you very much for your advices.
>
>
> Claus Ibsen-2 wrote:
>>
>> On Tue, Mar 23, 2010 at 8:24 PM, Justinson <justin...@googlemail.com>
>> wrote:
>>>
>>> Unfortunately I'm getting an OutOfMemoryError using XPath splitting the
>>> way
>>> you shown. I'm parsing a file with about 500000 xml messages.
>>
>> You could pre process the big file and split it into X files.
>> Maybe by using the java.util.Scanner to identify "good places" to
>> split the big file.
>>
>
> I'm just trying to handle the "format stack" properly: It's a byte stream in
> the base layer but an XML stream in the second layer. In my case the byte
> stream has no own structure so I cannot split it. Therefore I'd try to apply
> your second advice using XML-aware parsing.
>
>
> Claus Ibsen-2 wrote:
>>
>>
>> Or you could try using SAX based XML parsing when splitting to reduce
>> the memory overhead.
>> Just use a Bean for that. Something like this:
>>
>> public Iterator splitBigFile(java.io.File file) {
>>   // SAX parsing the big file and return an iterator or something that
>> can walk the XML messages you like
>> }
>>
>> And use the bean with the Camel Split EIP
>>
>>
>
> How it possible to integrate a "push" parser paradigm more smoothly into
> Camel than hinding it behind an iterator?
>
> (For iterator-based XML splitting, using StAX "pull" XML parsing is probably
> a more proper choice.)
>

Try googling for a solution using XPath in Java as its what is used
under the covers.
It have a XPathFactory where you can set features and whatnot. I may
offer ways to
tweak how it should run in pull or push mode. And whether it offers to
stream the result etc.



>
> Claus Ibsen-2 wrote:
>>
>>
>>> How can we use Apache Digester instead?
>>
>>
>
> The Commons Digester supports a XPath-like pattern-matching syntax and uses
> SAX behind the scenes. It also exibits the "push" paradigm of SAX but
> introduces a stack concept for match results. Thats why a stream-like
> handling is supported. Unfortunately Camel does not have a support for
> Digester at the moment.
>
> Another idea: Would you recommend using of Xstream for this task?
>
>
> Claus Ibsen-2 wrote:
>>
>>
>>> Claus Ibsen-2 wrote:
>>>>
>>>> Hi
>>>>
>>>> This is as far I got with the xpath expression for splitting
>>>> http://svn.apache.org/viewvc?rev=825156&view=rev
>>
>>
> --
> View this message in context: 
> http://old.nabble.com/handling-large-files-tp25826380p28038839.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Reply via email to