Hi

I have found the bug, so no need for a re-producer.

On Fri, Nov 12, 2021 at 9:37 AM Claus Ibsen <claus.ib...@gmail.com> wrote:

>
>
> On Fri, Nov 12, 2021 at 8:03 AM Mikael Andersson Wigander
> <mikael.andersson.wigan...@pm.me.invalid> wrote:
>
>> Ok thanks.
>>
>> But the outcome is very obvious when I analyse the heapdump.
>> The pool.clear() is never executed and the memory is filled with the
>> orphaned pool.
>> If the pool is to be used as a placeholder for the expressions then why
>> is not only 10 items in the pool? My pool after importing records are huge,
>> 10 times the amount of records in the split I would say.
>>
>>
>
> Yeah the pool should only be as high as number of concurrent threads at
> peak. How many items do you have?
> And can you put together a reproducer example and put on github or attach
> a .zip to the JIRA?
>
>
>
>
>>
>> /M
>>
>>
>>
>>
>> On fre, nov. 12, 2021 at 07:52, Claus Ibsen <claus.ib...@gmail.com>
>> skrev:
>>
>> Hi
>>
>> Mind about evaluating xpath in Java is not thread safe so you end up
>> having to create an instance of XPathExpression per xpath you want to
>> execute.
>> And your bean have 10 xpaths, so that is 10 per message, so you end up
>> with 10 XPathExpression instances in the memory.
>> And all legacy XML from the JDK/JVM is memory hungry (DOM, JAXB etc).
>>
>> And if you turn on parallel processing then you multiply this with
>> another 10 or more depending on number of concurrent threads etc.
>> In this case you can make thar argument that the pool should be able to
>> shrink in case there was a spike of concurrent processing which later is no
>> longer needed,
>> then the pool have too many free elements.
>>
>> Also as Alex mentions then a stax based parser may be better, which you
>> use with tokenizeXML.
>>
>> You talk about a leak, but is that really a leak? The xpath instance is
>> pooled so it can be re-used for the next message. So what you see in the
>> memory is those 10 XPathExpression instances.
>> If they are cleared after processing a message, then you end up having to
>> re-create the XPathExpression for the next message, and then you have more
>> CPU usage and also more pressure on the GC
>> to de-allocate those 10 XPathExpression per message.
>>
>>
>>
>>
>>
>> On Mon, Nov 8, 2021 at 8:12 PM Mikael Andersson Wigander
>> <mikael.andersson.wigan...@pm.me.invalid> wrote:
>>
>>> Hi
>>>
>>> With the risk of being seen as a n00b (again)…
>>>
>>> We are processing large XML files (0.5GB/~500.000 records).
>>> To process them we use stream caching, spit, parallel processing, xpath
>>> and a bean.
>>>
>>> We get a lot of OutOfMemoryExceptions and after analysing we see that
>>> the call to the bean method is the villain.
>>>
>>> The process is to split() using tokenizeXML() on a tag that makes up one
>>> record in the XML.
>>>
>>> For each of these records we call a bean where the method utilises
>>> @Xpath() on the method parameters.
>>>
>>> We see in the heap dump that these calls are never GC'd, we have 90%
>>> leftovers
>>> [image: image.png]
>>>
>>> The question is: is this related to a not thread safe bean/method or
>>> what could be the reason?
>>> The documentation states the default behaviour is a Signleton and when
>>> used in concurrent processing it must be thread safe…
>>> https://camel.apache.org/components/3.11.x/bean-component.html#_options
>>>
>>> Running as a war under Tomcat 9 on Windows using Camel 3.11.3 and Spring
>>> Boot 2.5.6.
>>> Server has 32GB of RAM…
>>>
>>> Route:
>>> from(file("Full"))
>>>                 .streamCaching()
>>>                 .unmarshal()
>>>                 .zipFile()
>>>                 .split()
>>>                 .tokenizeXML("RefData")
>>>                 .streaming()
>>>                 .parallelProcessing(false)
>>>                 .bean(XmlToSqlBean.class)
>>>                 .to(jdbc("default"))
>>>                 .end();
>>>
>>> Bean:
>>> public class XmlToSqlBean {
>>>             public String toSql(@XPath("//FinInstrmGnlAttrbts/Id") final
>>> String isin,
>>>                                 @XPath("//NtnlCcy") final String
>>> currency,
>>>                                 @XPath("//FullNm") final String fullName,
>>>                                 @XPath("//TradgVnRltdAttrbts/Id") final
>>> String venue,
>>>                                 @XPath("//ClssfctnTp") final String
>>> classification,
>>>                                 @XPath("//TradgVnRltdAttrbts/TermntnDt")
>>> final String terminationDate,
>>>                                 @XPath("//Issr") final String issuer,
>>>                                 @XPath("//MtrtyDt") String maturityDate,
>>>                                 @XPath("//TermntdRcrd") final String
>>> termnRecord,
>>>                                 @XPath("//NewRcrd") final String
>>> newRecord) {
>>>                 …
>>>             }
>>>         }
>>>
>>>
>>> Thanks
>>>
>>> /M
>>>
>>>
>>
>> --
>> Claus Ibsen
>> -----------------
>> http://davsclaus.com @davsclaus
>> Camel in Action 2: https://www.manning.com/ibsen2
>>
>>
>
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2
>


-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Reply via email to