Re: [D] HOP Sizing (hop)

Hans Van Akelyen Tue, 19 Nov 2024 05:00:42 -0800

Hi Thad,

You are right, we could add some more considerations in the best practices 
section of the documentation. Some of what Bart and I said is documented there.


Kr,
Hans
On 19 Nov 2024 at 12:36 +0100, Thad Guidry <[email protected]>, wrote:
> OK, so 2 of those things, not even I was aware of.
>
> Hans, Perhaps your reply could be put into the docs ?
>
> Thad
> https://www.linkedin.com/in/thadguidry/
> https://calendly.com/thadguidry/
>
>
> > On Tue, Nov 19, 2024 at 6:32 PM hansva (via GitHub) <[email protected]> wrote:
> > >
> > > GitHub user hansva added a comment to the discussion: HOP Sizing
> > >
> > > Hi @xProga,
> > >
> > > I'm afraid the answer to all this is "it depends". That's why we don't 
> > > have these guidelines.
> > >
> > > Some of the things that are known:
> > > - Each action/transform (or transform copy) will create a processor thread
> > >   - This means the maximum amount of active transforms equals the maximum 
> > > amount of threads the CPU supports (-1 for the main process), when the 
> > > amount of active transforms is higher thread switching will occur
> > >   - So our recommendation is to keep your pipelines as compact as 
> > > possible (~30 transforms sounds like a sane rule)
> > > - Each transform has a configurable buffer
> > >   - Each transform (copy) has an input buffer, compared to other tools we 
> > > do not load all data into memory and move them from one transform to the 
> > > next. We have a buffer system (default 10K rows) and transforms will fill 
> > > those buffers and get pushback signals to stop processing/fetching data
> > >   - This means the amount of active memory = rows in buffers x (columns x 
> > > data type)
> > >   - There are a couple of exceptions, eg. `Sort Rows` needs to have all 
> > > data so it will load all rows but it has a configurable buffer to spool 
> > > of data to disk
> > >
> > > Hop Web:
> > > Users can run workflows and pipelines inside Hop Web, it is not a 
> > > client/server application. This does imply that these instances need 
> > > enough resources to run the processes locally.
> > >
> > > Hop Server:
> > > This one is mainly used as a remote extension to local development. It is 
> > > a stateless server so workloads do not survive restarts. It does not have 
> > > scheduling.
> > >
> > > We recommend using short-lived containers for actual workload scheduling 
> > > using an orchestration tool of your choice.
> > >
> > > Hope this helps.
> > >
> > >
> > > GitHub link: 
> > > https://github.com/apache/hop/discussions/4586#discussioncomment-11303562
> > >
> > > ----
> > > This is an automatically sent email for [email protected].
> > > To unsubscribe, please send an email to: [email protected]
> > >

Re: [D] HOP Sizing (hop)

Reply via email to