Re: Performance of data stream on 3 cluster node.

2023-03-01 Thread John Smith
My key is phone_number and they are all unique... I'll check with the command... On Wed., Mar. 1, 2023, 11:20 a.m. Stephen Darlington, < stephen.darling...@gridgain.com> wrote: > The streamer doesn’t determine where the data goes. It just efficiently > sends it to the correct place. > > If your

Re: Performance of data stream on 3 cluster node.

2023-03-01 Thread Stephen Darlington
The streamer doesn’t determine where the data goes. It just efficiently sends it to the correct place. If your data is skewed in some way so that there is more data in some partitions than others, then you could find one machine with more work to do than others. All else being equal, you’ll

Re: Performance of data stream on 3 cluster node.

2023-03-01 Thread John Smith
Ok thanks. I just thought the streamer would be more uniform. On Wed, Mar 1, 2023 at 4:41 AM Stephen Darlington < stephen.darling...@gridgain.com> wrote: > You might want to check the data distribution. You can use control.sh > —cache distribution to do that. > > On 28 Feb 2023, at 20:32, John

Re: Performance of data stream on 3 cluster node.

2023-03-01 Thread Stephen Darlington
You might want to check the data distribution. You can use control.sh —cache distribution to do that. > On 28 Feb 2023, at 20:32, John Smith wrote: > > The last thing I can add to clarify is, the 3 node cluster is a centralized > cluster and the CSV loader is a thick client running on its own

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
The last thing I can add to clarify is, the 3 node cluster is a centralized cluster and the CSV loader is a thick client running on its own machine. On Tue, Feb 28, 2023 at 2:52 PM John Smith wrote: > Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM MY_TABLE > GROUP BY

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM MY_TABLE GROUP BY COLUMN_2; The query runs full tilt 100% on all 3 nodes and returns in a respectable manager. So not sure whats going on but with the data streamer I guess most of the writes are pushed to THE ONE node mostly and

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
Hi so I'm using it in a pretty straight forward kind of way at least I think... I'm loading 35 million lines from CSV to an SQL table. Decided to use streamer as I figured it would still be allot faster than batching SQL INSERTS. I tried with backup=0 and backup=1 (Prefer to have backup on) 1-

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread Jeremy McMillan
Have you tried tracing the workload on the 100% and 40% nodes for comparison? There just isn't enough detail in your question to help predict what should be happening with the cluster workload. For a starting point, please identify your design goals. It's easy to get confused by advice that seeks

Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
Hi I'm using the data streamer to insert into a 3 cluster node. I have noticed that 1 node is pegging at 100% cpu while the others are at 40ish %. Is that normal?