.synchronous

I'm an idiot, sorry for wasting bandwidth…
RTFM

/M

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 26th, 2021 at 19:12, Mikael Andersson Wigander 
<mikael.andersson.wigan...@pm.me.INVALID> wrote:

> Adding the readlock option reveals a camelLock file so I assume it processes 
> the file asynchronous.
>
> Any takers on this?
>
> /M
>
> På mån, okt. 25, 2021 vid 11:08, Mikael Andersson Wigander 
> <mikael.andersson.wigan...@pm.me.INVALID> skrev:
>
>> Hi
>>
>> Has there been a strategic change to the way the File component processes 
>> multiple files in one directory in version 3?
>>
>> It seems that it process them in parallel which in our situation creates a 
>> memory issue.
>>
>> Code:
>> from(file("{{esma.full.path}}")
>> .delete(true)
>> .sortBy("${file:name}"))
>> .description("Full Import", "Imports FUL files and persists in database", 
>> "en")
>> .autoStartup("{{esma.full.startup}}")
>> .streamCaching()
>> .log("Processing file ${file:name}")
>> .unmarshal()
>> .zipFile()
>> .split()
>> .tokenizeXML("RefData")
>> .streaming()
>> .parallelProcessing(true)
>> .bean(XmlToSqlBean.class)
>> .choice()
>> .when(body().isNotNull())
>> .to(jdbc("default"))
>> .to(log("Full Import").level(LoggingLevel.INFO.toString())
>> .groupInterval(60_000L)
>> .groupActiveOnly(true))
>> .when(simple("${header.CamelSplitComplete} == true"))
>> .log("Number of records split: ${header.CamelSplitSize}")
>> .log("Importing complete: ${header.CamelFileName}")
>> .endChoice()
>> .end();
>>
>> This route processes several zip files, unmarshals them and adds records to 
>> a database.
>>
>> The logs seems to reveal this scenario:
>> It logs every file in the directory like as if it was processing them in 
>> parallel and then the ThroughputLogger starts printing every minute. This 
>> logger is using one thread.
>>
>> 2021-10-25 08:31:58.106 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_C_20211023_01of01.zip
>> 2021-10-25 08:32:00.273 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_D_20211023_01of03.zip
>> 2021-10-25 08:32:10.922 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_D_20211023_02of03.zip
>> 2021-10-25 08:32:19.126 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_D_20211023_03of03.zip
>> 2021-10-25 08:32:19.762 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_E_20211023_01of02.zip
>> 2021-10-25 08:32:25.621 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_F_20211023_01of01.zip
>> 2021-10-25 08:32:26.911 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_H_20211023_01of01.zip
>> 2021-10-25 08:32:31.961 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_J_20211023_01of01.zip
>> 2021-10-25 08:32:36.249 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_O_20211023_01of02.zip
>> 2021-10-25 08:32:41.654 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_O_20211023_02of02.zip
>> 2021-10-25 08:32:44.830 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_R_20211023_01of06.zip
>> 2021-10-25 08:32:49.406 [Camel (FIRDSDatabase) thread #3 - ThroughputLogger] 
>> INFO Full Import.log - Received: 35392 new messages, with total 35392 so 
>> far. Last group took: 48977 millis which is: 722.625 messages per second. 
>> average: 722.625
>> 2021-10-25 08:32:49.724 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_R_20211023_02of06.zip
>> 2021-10-25 08:32:54.880 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_R_20211023_03of06.zip
>> 2021-10-25 08:33:00.867 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_R_20211023_04of06.zip
>> 2021-10-25 08:33:06.265 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_R_20211023_05of06.zip
>> 2021-10-25 08:33:11.222 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_R_20211023_06of06.zip
>> 2021-10-25 08:33:14.923 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_S_20211023_01of02.zip
>> 2021-10-25 08:33:20.119 [Camel (FIRDSDatabase) thread #5 - 
>> file://FIRDS/input/full] INFO Full Import.log - Processing file 
>> FULINS_S_20211023_02of02.zip
>>
>> We are using parallel processing for each zip file content (XML) but not for 
>> the files themselves.
>> If I don't use StreamCaching it will create a havoc on the server with 
>> OutMemoryException and stuff.
>>
>> This runs Spring Boot 2.5.6 and Camel 3.11.3
>>
>> Maybe I have done it in a wrong way but file processing is a bread and 
>> butter EIP so it shouldn't be a concern but still…
>> The files are around 15MB zipped, unzipped one XML file of size 0,5 GB. Each 
>> XML file contains around 500K records to split on. This is critical memory 
>> issue, I know, but it wouldn't be if the files are processed sequentially.
>> Looking at the database connections (using a hikariCP Pool) I see 12 
>> connections active, assuming this is equivalent to the amount of threads in 
>> the split. It performs around 800 records / second.
>>
>> Please advise
>>
>> /M

Reply via email to