Re: Performance issues when connecting to Oracle databases

Diego Mainou Fri, 31 May 2024 05:11:48 -0700

ah actually, 

The problem is likely with the cartesian join. you should be able to add the 
sequence directly as a step following the db input.


Else you are holding the whole query in memory as opposed to streaming it. 

[ https://www.bizcubed.com.au/ | 
                                                                
                                                                                
             ]  Diego Mainou 
Product Manager 
M. +61 415 152 091 
E. [ mailto:[email protected] | [email protected] ] 
[ https://www.bizcubed.com.au/ | www.bizcubed.com.au ]          [ 
https://www.bizcubed.com.au/ | 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                         ] 


From: "Diego Mainou" <[email protected]> 
To: "users" <[email protected]> 
Sent: Friday, 31 May, 2024 10:08:34 PM 
Subject: Re: Performance issues when connecting to Oracle databases 

Ok, 

2c and just guessing 


    * add a X second wait between pipelines and see how you go. May be just a 
timming issue with your DB. 
    * add some logging steps not only on the pipelines but on the error 
handling of the workflow. 

Going to bed now 

Good luck 


[ https://www.bizcubed.com.au/ | 
                                                                
                                                                                
             ]  Diego Mainou 
Product Manager 
M. +61 415 152 091 
E. [ mailto:[email protected] | [email protected] ] 
[ https://www.bizcubed.com.au/ | www.bizcubed.com.au ]          [ 
https://www.bizcubed.com.au/ | 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                         ] 


From: "Davide Cisco" <[email protected]> 
To: "users" <[email protected]> 
Sent: Friday, 31 May, 2024 9:01:04 PM 
Subject: Re: Performance issues when connecting to Oracle databases 

Hi Diego, 

the pipelines are indeed different files, they are all concatenated in 
the workflow with a "success" (green) link sequentially one to 
another. 

There is no real exchange of data from one pipeline to another (except 
for a set of workflow variables, that are set in the first pipeline 
and eventually used by the others): each pipeline reads data from one 
or two tables and writes the results in another table, all on its own. 

I attach some screenshots of the workflow (main) and some of the 
pipelines (the missing one are actually similar to those included... I 
could probably group them using ETL Metadata Injection somehow, but I 
couldn't make it work even this way...) 

DC 

-- 
Davide Cisco 
Università degli Studi di Padova 
Area Servizi Informatici e Telematici 
Ufficio Applicativi 
Settore Data Governance 
via San Francesco, 11 
35121 Padova 
tel. (+39) 049 8273819 


Il giorno ven 31 mag 2024 alle ore 12:36 Diego Mainou 
<[email protected]> ha scritto: 
> 
> Hi Davide, 
> 
> It is not clear from your description that the various pipelines are 
> different "files" 
> 
> If they are not then that is your problem. Pipelines do not run sequentially. 
> Everything happens at the same time. 
> Therefore things are tripping over themselves. 
> 
> The recommended path would be that the first pipeline finishes with a 
> pipeline executor transform. 
> This transform then executes the next pipeline and so on. 
> This transform also passes down the parameters collated by the first 
> transform. 
> 
> In turn if we are talking about a workflow can you please explain how are you 
> passing the values from one pipeline to the next? 
> e.g. as described above passing them down, or setting variables and passing 
> them up, or passing the results up (not recommended due to the memory 
> requirements)? 
> 
> Diego 
> 
> 
> 
> 
> 
> Diego Mainou 
> Product Manager 
> M. +61 415 152 091 
> E. [email protected] 
> www.bizcubed.com.au 
> 
> ________________________________ 
> From: "Davide Cisco" <[email protected]> 
> To: "users" <[email protected]> 
> Sent: Friday, 31 May, 2024 8:06:48 PM 
> Subject: Performance issues when connecting to Oracle databases 
> 
> Hello, 
> 
> I set up a workflow with various pipelines in sequence, each one of them 
> should: 
> - read data from a table in an Oracle database (sometimes it has to 
> lookup a key in another table from another Oracle database) 
> - process the collected records (by adding a key and eventually a 
> reference year) 
> - write the resulting rows to a third table in another Oracle database 
> (the same of the eventual lookup above) 
> 
> As long as I developed the various pipelines one at a time and tested 
> the workflow at every edit, it ran without errors. 
> 
> Since I completed the workflow with all the necessary pipelines, I 
> could never make it to the end: the workflow hangs at a random 
> pipeline, shows no particular error but can't reach its conclusion. 
> 
> I suspected there are some memory allocation issues, but even 
> increasing the Java dedicated memory to 8 GB (HOP_OPTIONS="-Xmx8192m") 
> didn't do the job. 
> 
> I could probably solve the issues by adding here and there some 
> "cleanup" statements/components (since each pipeline processes a 
> completely different set of tables), but I can't figure a way to do 
> that. 
> 
> Is there any possibilty to improve the performance (by the above 
> mentioned cleanup components and/or some configuration in the Oracle 
> JDBC driver), in order to make the workflow complete? 
> 
> Thanks for any suggestion 
> 
> DC

Re: Performance issues when connecting to Oracle databases

Reply via email to