ah actually, The problem is likely with the cartesian join. you should be able to add the sequence directly as a step following the db input.
Else you are holding the whole query in memory as opposed to streaming it. [ https://www.bizcubed.com.au/ | ] Diego Mainou Product Manager M. +61 415 152 091 E. [ mailto:[email protected] | [email protected] ] [ https://www.bizcubed.com.au/ | www.bizcubed.com.au ] [ https://www.bizcubed.com.au/ | ] From: "Diego Mainou" <[email protected]> To: "users" <[email protected]> Sent: Friday, 31 May, 2024 10:08:34 PM Subject: Re: Performance issues when connecting to Oracle databases Ok, 2c and just guessing * add a X second wait between pipelines and see how you go. May be just a timming issue with your DB. * add some logging steps not only on the pipelines but on the error handling of the workflow. Going to bed now Good luck [ https://www.bizcubed.com.au/ | ] Diego Mainou Product Manager M. +61 415 152 091 E. [ mailto:[email protected] | [email protected] ] [ https://www.bizcubed.com.au/ | www.bizcubed.com.au ] [ https://www.bizcubed.com.au/ | ] From: "Davide Cisco" <[email protected]> To: "users" <[email protected]> Sent: Friday, 31 May, 2024 9:01:04 PM Subject: Re: Performance issues when connecting to Oracle databases Hi Diego, the pipelines are indeed different files, they are all concatenated in the workflow with a "success" (green) link sequentially one to another. There is no real exchange of data from one pipeline to another (except for a set of workflow variables, that are set in the first pipeline and eventually used by the others): each pipeline reads data from one or two tables and writes the results in another table, all on its own. I attach some screenshots of the workflow (main) and some of the pipelines (the missing one are actually similar to those included... I could probably group them using ETL Metadata Injection somehow, but I couldn't make it work even this way...) DC -- Davide Cisco Università degli Studi di Padova Area Servizi Informatici e Telematici Ufficio Applicativi Settore Data Governance via San Francesco, 11 35121 Padova tel. (+39) 049 8273819 Il giorno ven 31 mag 2024 alle ore 12:36 Diego Mainou <[email protected]> ha scritto: > > Hi Davide, > > It is not clear from your description that the various pipelines are > different "files" > > If they are not then that is your problem. Pipelines do not run sequentially. > Everything happens at the same time. > Therefore things are tripping over themselves. > > The recommended path would be that the first pipeline finishes with a > pipeline executor transform. > This transform then executes the next pipeline and so on. > This transform also passes down the parameters collated by the first > transform. > > In turn if we are talking about a workflow can you please explain how are you > passing the values from one pipeline to the next? > e.g. as described above passing them down, or setting variables and passing > them up, or passing the results up (not recommended due to the memory > requirements)? > > Diego > > > > > > Diego Mainou > Product Manager > M. +61 415 152 091 > E. [email protected] > www.bizcubed.com.au > > ________________________________ > From: "Davide Cisco" <[email protected]> > To: "users" <[email protected]> > Sent: Friday, 31 May, 2024 8:06:48 PM > Subject: Performance issues when connecting to Oracle databases > > Hello, > > I set up a workflow with various pipelines in sequence, each one of them > should: > - read data from a table in an Oracle database (sometimes it has to > lookup a key in another table from another Oracle database) > - process the collected records (by adding a key and eventually a > reference year) > - write the resulting rows to a third table in another Oracle database > (the same of the eventual lookup above) > > As long as I developed the various pipelines one at a time and tested > the workflow at every edit, it ran without errors. > > Since I completed the workflow with all the necessary pipelines, I > could never make it to the end: the workflow hangs at a random > pipeline, shows no particular error but can't reach its conclusion. > > I suspected there are some memory allocation issues, but even > increasing the Java dedicated memory to 8 GB (HOP_OPTIONS="-Xmx8192m") > didn't do the job. > > I could probably solve the issues by adding here and there some > "cleanup" statements/components (since each pipeline processes a > completely different set of tables), but I can't figure a way to do > that. > > Is there any possibilty to improve the performance (by the above > mentioned cleanup components and/or some configuration in the Oracle > JDBC driver), in order to make the workflow complete? > > Thanks for any suggestion > > DC
