Hi Thad, You are right, we could add some more considerations in the best practices section of the documentation. Some of what Bart and I said is documented there.
Kr, Hans On 19 Nov 2024 at 12:36 +0100, Thad Guidry <[email protected]>, wrote: > OK, so 2 of those things, not even I was aware of. > > Hans, Perhaps your reply could be put into the docs ? > > Thad > https://www.linkedin.com/in/thadguidry/ > https://calendly.com/thadguidry/ > > > > On Tue, Nov 19, 2024 at 6:32 PM hansva (via GitHub) <[email protected]> wrote: > > > > > > GitHub user hansva added a comment to the discussion: HOP Sizing > > > > > > Hi @xProga, > > > > > > I'm afraid the answer to all this is "it depends". That's why we don't > > > have these guidelines. > > > > > > Some of the things that are known: > > > - Each action/transform (or transform copy) will create a processor thread > > > - This means the maximum amount of active transforms equals the maximum > > > amount of threads the CPU supports (-1 for the main process), when the > > > amount of active transforms is higher thread switching will occur > > > - So our recommendation is to keep your pipelines as compact as > > > possible (~30 transforms sounds like a sane rule) > > > - Each transform has a configurable buffer > > > - Each transform (copy) has an input buffer, compared to other tools we > > > do not load all data into memory and move them from one transform to the > > > next. We have a buffer system (default 10K rows) and transforms will fill > > > those buffers and get pushback signals to stop processing/fetching data > > > - This means the amount of active memory = rows in buffers x (columns x > > > data type) > > > - There are a couple of exceptions, eg. `Sort Rows` needs to have all > > > data so it will load all rows but it has a configurable buffer to spool > > > of data to disk > > > > > > Hop Web: > > > Users can run workflows and pipelines inside Hop Web, it is not a > > > client/server application. This does imply that these instances need > > > enough resources to run the processes locally. > > > > > > Hop Server: > > > This one is mainly used as a remote extension to local development. It is > > > a stateless server so workloads do not survive restarts. It does not have > > > scheduling. > > > > > > We recommend using short-lived containers for actual workload scheduling > > > using an orchestration tool of your choice. > > > > > > Hope this helps. > > > > > > > > > GitHub link: > > > https://github.com/apache/hop/discussions/4586#discussioncomment-11303562 > > > > > > ---- > > > This is an automatically sent email for [email protected]. > > > To unsubscribe, please send an email to: [email protected] > > >
