Hi I have found the bug, so no need for a re-producer.
On Fri, Nov 12, 2021 at 9:37 AM Claus Ibsen <claus.ib...@gmail.com> wrote: > > > On Fri, Nov 12, 2021 at 8:03 AM Mikael Andersson Wigander > <mikael.andersson.wigan...@pm.me.invalid> wrote: > >> Ok thanks. >> >> But the outcome is very obvious when I analyse the heapdump. >> The pool.clear() is never executed and the memory is filled with the >> orphaned pool. >> If the pool is to be used as a placeholder for the expressions then why >> is not only 10 items in the pool? My pool after importing records are huge, >> 10 times the amount of records in the split I would say. >> >> > > Yeah the pool should only be as high as number of concurrent threads at > peak. How many items do you have? > And can you put together a reproducer example and put on github or attach > a .zip to the JIRA? > > > > >> >> /M >> >> >> >> >> On fre, nov. 12, 2021 at 07:52, Claus Ibsen <claus.ib...@gmail.com> >> skrev: >> >> Hi >> >> Mind about evaluating xpath in Java is not thread safe so you end up >> having to create an instance of XPathExpression per xpath you want to >> execute. >> And your bean have 10 xpaths, so that is 10 per message, so you end up >> with 10 XPathExpression instances in the memory. >> And all legacy XML from the JDK/JVM is memory hungry (DOM, JAXB etc). >> >> And if you turn on parallel processing then you multiply this with >> another 10 or more depending on number of concurrent threads etc. >> In this case you can make thar argument that the pool should be able to >> shrink in case there was a spike of concurrent processing which later is no >> longer needed, >> then the pool have too many free elements. >> >> Also as Alex mentions then a stax based parser may be better, which you >> use with tokenizeXML. >> >> You talk about a leak, but is that really a leak? The xpath instance is >> pooled so it can be re-used for the next message. So what you see in the >> memory is those 10 XPathExpression instances. >> If they are cleared after processing a message, then you end up having to >> re-create the XPathExpression for the next message, and then you have more >> CPU usage and also more pressure on the GC >> to de-allocate those 10 XPathExpression per message. >> >> >> >> >> >> On Mon, Nov 8, 2021 at 8:12 PM Mikael Andersson Wigander >> <mikael.andersson.wigan...@pm.me.invalid> wrote: >> >>> Hi >>> >>> With the risk of being seen as a n00b (again)… >>> >>> We are processing large XML files (0.5GB/~500.000 records). >>> To process them we use stream caching, spit, parallel processing, xpath >>> and a bean. >>> >>> We get a lot of OutOfMemoryExceptions and after analysing we see that >>> the call to the bean method is the villain. >>> >>> The process is to split() using tokenizeXML() on a tag that makes up one >>> record in the XML. >>> >>> For each of these records we call a bean where the method utilises >>> @Xpath() on the method parameters. >>> >>> We see in the heap dump that these calls are never GC'd, we have 90% >>> leftovers >>> [image: image.png] >>> >>> The question is: is this related to a not thread safe bean/method or >>> what could be the reason? >>> The documentation states the default behaviour is a Signleton and when >>> used in concurrent processing it must be thread safe… >>> https://camel.apache.org/components/3.11.x/bean-component.html#_options >>> >>> Running as a war under Tomcat 9 on Windows using Camel 3.11.3 and Spring >>> Boot 2.5.6. >>> Server has 32GB of RAM… >>> >>> Route: >>> from(file("Full")) >>> .streamCaching() >>> .unmarshal() >>> .zipFile() >>> .split() >>> .tokenizeXML("RefData") >>> .streaming() >>> .parallelProcessing(false) >>> .bean(XmlToSqlBean.class) >>> .to(jdbc("default")) >>> .end(); >>> >>> Bean: >>> public class XmlToSqlBean { >>> public String toSql(@XPath("//FinInstrmGnlAttrbts/Id") final >>> String isin, >>> @XPath("//NtnlCcy") final String >>> currency, >>> @XPath("//FullNm") final String fullName, >>> @XPath("//TradgVnRltdAttrbts/Id") final >>> String venue, >>> @XPath("//ClssfctnTp") final String >>> classification, >>> @XPath("//TradgVnRltdAttrbts/TermntnDt") >>> final String terminationDate, >>> @XPath("//Issr") final String issuer, >>> @XPath("//MtrtyDt") String maturityDate, >>> @XPath("//TermntdRcrd") final String >>> termnRecord, >>> @XPath("//NewRcrd") final String >>> newRecord) { >>> … >>> } >>> } >>> >>> >>> Thanks >>> >>> /M >>> >>> >> >> -- >> Claus Ibsen >> ----------------- >> http://davsclaus.com @davsclaus >> Camel in Action 2: https://www.manning.com/ibsen2 >> >> > > -- > Claus Ibsen > ----------------- > http://davsclaus.com @davsclaus > Camel in Action 2: https://www.manning.com/ibsen2 > -- Claus Ibsen ----------------- http://davsclaus.com @davsclaus Camel in Action 2: https://www.manning.com/ibsen2