.synchronous I'm an idiot, sorry for wasting bandwidth… RTFM
/M ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, October 26th, 2021 at 19:12, Mikael Andersson Wigander <mikael.andersson.wigan...@pm.me.INVALID> wrote: > Adding the readlock option reveals a camelLock file so I assume it processes > the file asynchronous. > > Any takers on this? > > /M > > På mån, okt. 25, 2021 vid 11:08, Mikael Andersson Wigander > <mikael.andersson.wigan...@pm.me.INVALID> skrev: > >> Hi >> >> Has there been a strategic change to the way the File component processes >> multiple files in one directory in version 3? >> >> It seems that it process them in parallel which in our situation creates a >> memory issue. >> >> Code: >> from(file("{{esma.full.path}}") >> .delete(true) >> .sortBy("${file:name}")) >> .description("Full Import", "Imports FUL files and persists in database", >> "en") >> .autoStartup("{{esma.full.startup}}") >> .streamCaching() >> .log("Processing file ${file:name}") >> .unmarshal() >> .zipFile() >> .split() >> .tokenizeXML("RefData") >> .streaming() >> .parallelProcessing(true) >> .bean(XmlToSqlBean.class) >> .choice() >> .when(body().isNotNull()) >> .to(jdbc("default")) >> .to(log("Full Import").level(LoggingLevel.INFO.toString()) >> .groupInterval(60_000L) >> .groupActiveOnly(true)) >> .when(simple("${header.CamelSplitComplete} == true")) >> .log("Number of records split: ${header.CamelSplitSize}") >> .log("Importing complete: ${header.CamelFileName}") >> .endChoice() >> .end(); >> >> This route processes several zip files, unmarshals them and adds records to >> a database. >> >> The logs seems to reveal this scenario: >> It logs every file in the directory like as if it was processing them in >> parallel and then the ThroughputLogger starts printing every minute. This >> logger is using one thread. >> >> 2021-10-25 08:31:58.106 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_C_20211023_01of01.zip >> 2021-10-25 08:32:00.273 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_D_20211023_01of03.zip >> 2021-10-25 08:32:10.922 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_D_20211023_02of03.zip >> 2021-10-25 08:32:19.126 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_D_20211023_03of03.zip >> 2021-10-25 08:32:19.762 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_E_20211023_01of02.zip >> 2021-10-25 08:32:25.621 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_F_20211023_01of01.zip >> 2021-10-25 08:32:26.911 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_H_20211023_01of01.zip >> 2021-10-25 08:32:31.961 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_J_20211023_01of01.zip >> 2021-10-25 08:32:36.249 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_O_20211023_01of02.zip >> 2021-10-25 08:32:41.654 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_O_20211023_02of02.zip >> 2021-10-25 08:32:44.830 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_R_20211023_01of06.zip >> 2021-10-25 08:32:49.406 [Camel (FIRDSDatabase) thread #3 - ThroughputLogger] >> INFO Full Import.log - Received: 35392 new messages, with total 35392 so >> far. Last group took: 48977 millis which is: 722.625 messages per second. >> average: 722.625 >> 2021-10-25 08:32:49.724 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_R_20211023_02of06.zip >> 2021-10-25 08:32:54.880 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_R_20211023_03of06.zip >> 2021-10-25 08:33:00.867 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_R_20211023_04of06.zip >> 2021-10-25 08:33:06.265 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_R_20211023_05of06.zip >> 2021-10-25 08:33:11.222 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_R_20211023_06of06.zip >> 2021-10-25 08:33:14.923 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_S_20211023_01of02.zip >> 2021-10-25 08:33:20.119 [Camel (FIRDSDatabase) thread #5 - >> file://FIRDS/input/full] INFO Full Import.log - Processing file >> FULINS_S_20211023_02of02.zip >> >> We are using parallel processing for each zip file content (XML) but not for >> the files themselves. >> If I don't use StreamCaching it will create a havoc on the server with >> OutMemoryException and stuff. >> >> This runs Spring Boot 2.5.6 and Camel 3.11.3 >> >> Maybe I have done it in a wrong way but file processing is a bread and >> butter EIP so it shouldn't be a concern but still… >> The files are around 15MB zipped, unzipped one XML file of size 0,5 GB. Each >> XML file contains around 500K records to split on. This is critical memory >> issue, I know, but it wouldn't be if the files are processed sequentially. >> Looking at the database connections (using a hikariCP Pool) I see 12 >> connections active, assuming this is equivalent to the amount of threads in >> the split. It performs around 800 records / second. >> >> Please advise >> >> /M