Mikael Andersson Wigander, While it is not a Camel solution, large documents can be parsed using a streaming Xpath parser. This can be implemented using SAX. This would work in your use case since the Xpath in question are not performing lookback. This way the entire document is never read into memory at once. When you need millisecond performance this is a good option.
Alex Mattern From: Siano, Stephan <stephan.si...@sap.com.INVALID> Sent: Tuesday, November 9, 2021 12:41 AM To: users@camel.apache.org Subject: [EXTERNAL SENDER:] RE: How to make a bean thread safe? Hi I don’t think that this issue is related to thread safety. XPath as such is a very expensive operation as it requires parsing the document into a DOM. You have 10 of those XPath parameters and the heap dump shows 10 XPath builders that are consuming a lot of memory. You’d probably better pass the payload only once (maybe as a Node or Document) and then execute the XPath expressions on it inside the bean (then you will only parse your document once and have only one DOM tree). Best regards Stephan From: Mikael Andersson Wigander <mikael.andersson.wigan...@pm.me.INVALID> Sent: Monday, 8 November 2021 20:30 To: Camel Mail List <users@camel.apache.org> Subject: Sv: How to make a bean thread safe? There’s a typo in the code sample. The processing SHOULD be parallel, not sequential as in the snippet. /M På mån, nov. 8, 2021 vid 20:11, Mikael Andersson Wigander <mikael.andersson.wigan...@pm.me.INVALID<mailto:mikael.andersson.wigan...@pm.me.INVALID>> skrev: Hi With the risk of being seen as a n00b (again)… We are processing large XML files (0.5GB/~500.000 records). To process them we use stream caching, spit, parallel processing, xpath and a bean. We get a lot of OutOfMemoryExceptions and after analysing we see that the call to the bean method is the villain. The process is to split() using tokenizeXML() on a tag that makes up one record in the XML. For each of these records we call a bean where the method utilises @Xpath() on the method parameters. We see in the heap dump that these calls are never GC'd, we have 90% leftovers [cid:image001.png@01D7D534.B0F2D940] The question is: is this related to a not thread safe bean/method or what could be the reason? The documentation states the default behaviour is a Signleton and when used in concurrent processing it must be thread safe… https://camel.apache.org/components/3.11.x/bean-component.html#_options<https://urldefense.com/v3/__https:/camel.apache.org/components/3.11.x/bean-component.html*_options__;Iw!!KV6Wb-o!ogNQ7izVYRBfZZ5ZiPzvYH0PrFUlFEqoEeGe3LK-HvumrNJUGw23j6Z8oeaX18Dh$> Running as a war under Tomcat 9 on Windows using Camel 3.11.3 and Spring Boot 2.5.6. Server has 32GB of RAM… Route: from(file("Full")) .streamCaching() .unmarshal() .zipFile() .split() .tokenizeXML("RefData") .streaming() .parallelProcessing(false) .bean(XmlToSqlBean.class) .to(jdbc("default")) .end(); Bean: public class XmlToSqlBean { public String toSql(@XPath("//FinInstrmGnlAttrbts/Id") final String isin, @XPath("//NtnlCcy") final String currency, @XPath("//FullNm") final String fullName, @XPath("//TradgVnRltdAttrbts/Id") final String venue, @XPath("//ClssfctnTp") final String classification, @XPath("//TradgVnRltdAttrbts/TermntnDt") final String terminationDate, @XPath("//Issr") final String issuer, @XPath("//MtrtyDt") String maturityDate, @XPath("//TermntdRcrd") final String termnRecord, @XPath("//NewRcrd") final String newRecord) { … } } Thanks /M *************************** IMPORTANT NOTE***************************** The opinions expressed in this message and/or any attachments are those of the author and not necessarily those of Brown Brothers Harriman & Co., its subsidiaries and affiliates ("BBH"). There is no guarantee that this message is either private or confidential, and it may have been altered by unauthorized sources without your or our knowledge. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice. BBH accepts no responsibility for loss or damage from its use, including damage from virus. ******************************************************************************