Re: Performance issues when connecting to Oracle databases

Davide Cisco Fri, 31 May 2024 07:27:18 -0700

Hi Diego,

I managed to remove the cartesian join from all the pipelines: they
were there since I have to replicate the rows in function of a
sequence of reference years (e.g. create a row for 2023 and another
for 2024, for all the rows), and then discard those rows that aren't
related to the considered period (e.g. for a row related to the period
2020-2023, the row joined with the year 2024 would be discarded). I
could achieve the same target by using a chain of Clone row -> Add
value fields changing sequence -> Filter rows.


However, I still have the job hanging. I also tried to slow down the
operation of writing data in the database (by inserting a Delay row in
every pipeline before writing to the output, and then inserting a Wait
for action between every two pipelines in the workflow), but I had no
improvement.

I tried to increase the log detail but the only significant
information I got is that the workflow hangs at the moment in which
the connection with the output database is to be established (in any
pipeline, as mentioned). By the way, the higher the log detail (Basic
-> Detailed -> Debug), the earlier the workflow seems to stop...

Any other ideas are warmly welcome... thanks anyway for your support

DC

--
Davide Cisco
Università degli Studi di Padova
Area Servizi Informatici e Telematici
Ufficio Applicativi
Settore Data Governance
via San Francesco, 11
35121 Padova
tel. (+39) 049 8273819


Il giorno ven 31 mag 2024 alle ore 14:11 Diego Mainou
<[email protected]> ha scritto:
>
> ah actually,
>
> The problem is likely with the cartesian join. you should be able to add the 
> sequence directly as a step following the db input.
>
> Else you are holding the whole query in memory as opposed to streaming it.
>
> Diego Mainou
> Product Manager
> M. +61 415 152 091
> E. [email protected]
> www.bizcubed.com.au
>
> ________________________________
> From: "Diego Mainou" <[email protected]>
> To: "users" <[email protected]>
> Sent: Friday, 31 May, 2024 10:08:34 PM
> Subject: Re: Performance issues when connecting to Oracle databases
>
> Ok,
>
> 2c and just guessing
>
> add a X second wait between pipelines and see how you go. May be just a 
> timming issue with your DB.
> add some logging steps not only on the pipelines but on the error handling of 
> the workflow.
>
> Going to bed now
>
> Good luck
>
>
> Diego Mainou
> Product Manager
> M. +61 415 152 091
> E. [email protected]
> www.bizcubed.com.au
>
> ________________________________
> From: "Davide Cisco" <[email protected]>
> To: "users" <[email protected]>
> Sent: Friday, 31 May, 2024 9:01:04 PM
> Subject: Re: Performance issues when connecting to Oracle databases
>
> Hi Diego,
>
> the pipelines are indeed different files, they are all concatenated in
> the workflow with a "success" (green) link sequentially one to
> another.
>
> There is no real exchange of data from one pipeline to another (except
> for a set of workflow variables, that are set in the first pipeline
> and eventually used by the others): each pipeline reads data from one
> or two tables and writes the results in another table, all on its own.
>
> I attach some screenshots of the workflow (main) and some of the
> pipelines (the missing one are actually similar to those included... I
> could probably group them using ETL Metadata Injection somehow, but I
> couldn't make it work even this way...)
>
> DC
>
> --
> Davide Cisco
> Università degli Studi di Padova
> Area Servizi Informatici e Telematici
> Ufficio Applicativi
> Settore Data Governance
> via San Francesco, 11
> 35121 Padova
> tel. (+39) 049 8273819
>
>
> Il giorno ven 31 mag 2024 alle ore 12:36 Diego Mainou
> <[email protected]> ha scritto:
> >
> > Hi Davide,
> >
> > It is not clear from your description that the various pipelines are 
> > different "files"
> >
> > If they are not then that is your problem. Pipelines do not run 
> > sequentially. Everything happens at the same time.
> > Therefore things are tripping over themselves.
> >
> > The recommended path would be that the first pipeline finishes with a 
> > pipeline executor transform.
> > This transform then executes the next pipeline and so on.
> > This transform also passes down the parameters collated by the first 
> > transform.
> >
> > In turn if we are talking about a workflow can you please explain how are 
> > you passing the values from one pipeline to the next?
> > e.g. as described above passing them down, or setting variables and passing 
> > them up, or passing the results up (not recommended due to the memory 
> > requirements)?
> >
> > Diego
> >
> >
> >
> >
> >
> > Diego Mainou
> > Product Manager
> > M. +61 415 152 091
> > E. [email protected]
> > www.bizcubed.com.au
> >
> > ________________________________
> > From: "Davide Cisco" <[email protected]>
> > To: "users" <[email protected]>
> > Sent: Friday, 31 May, 2024 8:06:48 PM
> > Subject: Performance issues when connecting to Oracle databases
> >
> > Hello,
> >
> > I set up a workflow with various pipelines in sequence, each one of them 
> > should:
> > - read data from a table in an Oracle database (sometimes it has to
> > lookup a key in another table from another Oracle database)
> > - process the collected records (by adding a key and eventually a
> > reference year)
> > - write the resulting rows to a third table in another Oracle database
> > (the same of the eventual lookup above)
> >
> > As long as I developed the various pipelines one at a time and tested
> > the workflow at every edit, it ran without errors.
> >
> > Since I completed the workflow with all the necessary pipelines, I
> > could never make it to the end: the workflow hangs at a random
> > pipeline, shows no particular error but can't reach its conclusion.
> >
> > I suspected there are some memory allocation issues, but even
> > increasing the Java dedicated memory to 8 GB (HOP_OPTIONS="-Xmx8192m")
> > didn't do the job.
> >
> > I could probably solve the issues by adding here and there some
> > "cleanup" statements/components (since each pipeline processes a
> > completely different set of tables), but I can't figure a way to do
> > that.
> >
> > Is there any possibilty to improve the performance (by the above
> > mentioned cleanup components and/or some configuration in the Oracle
> > JDBC driver), in order to make the workflow complete?
> >
> > Thanks for any suggestion
> >
> > DC
>

Re: Performance issues when connecting to Oracle databases

Reply via email to