Re: Read write performance check

2024-01-01 Thread Hao Zhang
Veem

You should also be familiar with Aurora Postgres's storage
architecture, which is very different from regular Postgres (see
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html
)
Aurora is remote storage, which means if your read workload can't fit into
the PG's shared buffers, it will have a very different performance than if
the storage is a local SSD.
On write, it writes six copies to three different availability zones for
high durability and availablity. So having enough network bandwidth is a
factor as well.

Ken

On Tue, Dec 26, 2023 at 11:30 PM Kirk Wolak  wrote:

> On Thu, Dec 21, 2023 at 8:31 AM veem v  wrote:
>
>> Can someone please guide me, if any standard scripting is available for
>> doing such read/write performance test? Or point me to any available docs?
>>
>>
>> ...
>
>
> Veem, first things first... "Top Posting" is when you reply at the top of
> the email...  Notice how I replied at the bottom (and I deleted context,
> clearly).
> This is the style we prefer here.
>
> Second, since you are new to postgreSQL... Let me recommend some reading.
> Cybertec has articles on performance (Tom Kyte style).
> Also, read the "Don't Do That" wiki, and finally, have a look at pgbench
> and psql documentation.  And specifically look at GENERATE_SERIES(),
> but the Cybertec articles will touch on that.  Regardless...  Reading the
> docs is insightful.
>
> Links:
> https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
> https://wiki.postgresql.org/wiki/Don't_Do_This
> https://www.postgresql.org/docs/current/app-psql.html
> https://www.postgresql.org/docs/current/pgbench.html
> https://www.postgresql.org/docs/16/functions-srf.html
>
> HTH,
>
> Kirk Out!
>
>
>


Re: Read write performance check

2023-12-26 Thread Kirk Wolak
On Thu, Dec 21, 2023 at 8:31 AM veem v  wrote:

> Can someone please guide me, if any standard scripting is available for
> doing such read/write performance test? Or point me to any available docs?
>
>
> ...


Veem, first things first... "Top Posting" is when you reply at the top of
the email...  Notice how I replied at the bottom (and I deleted context,
clearly).
This is the style we prefer here.

Second, since you are new to postgreSQL... Let me recommend some reading.
Cybertec has articles on performance (Tom Kyte style).
Also, read the "Don't Do That" wiki, and finally, have a look at pgbench
and psql documentation.  And specifically look at GENERATE_SERIES(),
but the Cybertec articles will touch on that.  Regardless...  Reading the
docs is insightful.

Links:
https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
https://wiki.postgresql.org/wiki/Don't_Do_This
https://www.postgresql.org/docs/current/app-psql.html
https://www.postgresql.org/docs/current/pgbench.html
https://www.postgresql.org/docs/16/functions-srf.html

HTH,

Kirk Out!


Re: Read write performance check

2023-12-22 Thread Lok P
As I mentioned your scenario looks generic one, but I don't have any sample
scripts/docs to share, sorry for that. Other people may suggest any sample
scripts etc if any. or you may post it on performance group, if someone has
done similar stuff in the past.

 But as per me, the performance test scripts will look like 1) row by row
insert/select in cursor loop as you were showing earlier in this thread and
another one with batch/bulk inserts. And then calling it through pgbench or
any other scheduler for creating concurrency.

On Thu, Dec 21, 2023 at 7:00 PM veem v  wrote:

> Can someone please guide me, if any standard scripting is available for
> doing such read/write performance test? Or point me to any available docs?
>
> On Wed, 20 Dec, 2023, 10:39 am veem v,  wrote:
>
>> Thank you.
>>
>> That would really be helpful if such test scripts or similar setups are
>> already available. Can someone please guide me to some docs or blogs or
>> sample scripts, on same please.
>>
>> On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:
>>
>>> As Rob mentioned, the syntax you posted is not correct. You need to
>>> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
>>> at one shot.
>>>
>>> But again your uses case seems common one considering you want to
>>> compare the read and write performance on multiple databases with similar
>>> table structure as per your usecase. So in that case, you may want to use
>>> some test scripts which others must have already done rather reinventing
>>> the wheel.
>>>
>>>
>>> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>>>
 Thank you.

 Yes, actually we are trying to compare and see what maximum TPS are we
 able to reach with both of these row by row and batch read/write test. And
 then afterwards, this figure may be compared with other databases etc with
 similar setups.

  So wanted to understand from experts here, if this approach is fine?
 Or some other approach is advisable?

 I agree to the point that , network will play a role in real world app,
 but here, we are mainly wanted to see the database capability, as network
 will always play a similar kind of role across all databases. Do you
 suggest some other approach to achieve this objective?


 On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, 
 wrote:

> On 2023-12-20 00:44:48 +0530, veem v wrote:
> >  So at first, we need to populate the base tables with the necessary
> data (say
> > 100million rows) with required skewness using random functions to
> generate the
> > variation in the values of different data types. Then in case of row
> by row
> > write/read test , we can traverse in a cursor loop. and in case of
> batch write/
> > insert , we need to traverse in a bulk collect loop. Something like
> below and
> > then this code can be wrapped into a procedure and passed to the
> pgbench and
> > executed from there. Please correct me if I'm wrong.
>
> One important point to consider for benchmarks is that your benchmark
> has to be similar to the real application to be useful. If your real
> application runs on a different node and connects to the database over
> the network, a benchmark running within a stored procedure may not be
> very indicative of real performance.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>



Re: Read write performance check

2023-12-21 Thread veem v
Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

On Wed, 20 Dec, 2023, 10:39 am veem v,  wrote:

> Thank you.
>
> That would really be helpful if such test scripts or similar setups are
> already available. Can someone please guide me to some docs or blogs or
> sample scripts, on same please.
>
> On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:
>
>> As Rob mentioned, the syntax you posted is not correct. You need to
>> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
>> at one shot.
>>
>> But again your uses case seems common one considering you want to compare
>> the read and write performance on multiple databases with similar table
>> structure as per your usecase. So in that case, you may want to use some
>> test scripts which others must have already done rather reinventing the
>> wheel.
>>
>>
>> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>>
>>> Thank you.
>>>
>>> Yes, actually we are trying to compare and see what maximum TPS are we
>>> able to reach with both of these row by row and batch read/write test. And
>>> then afterwards, this figure may be compared with other databases etc with
>>> similar setups.
>>>
>>>  So wanted to understand from experts here, if this approach is fine? Or
>>> some other approach is advisable?
>>>
>>> I agree to the point that , network will play a role in real world app,
>>> but here, we are mainly wanted to see the database capability, as network
>>> will always play a similar kind of role across all databases. Do you
>>> suggest some other approach to achieve this objective?
>>>
>>>
>>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>>>
 On 2023-12-20 00:44:48 +0530, veem v wrote:
 >  So at first, we need to populate the base tables with the necessary
 data (say
 > 100million rows) with required skewness using random functions to
 generate the
 > variation in the values of different data types. Then in case of row
 by row
 > write/read test , we can traverse in a cursor loop. and in case of
 batch write/
 > insert , we need to traverse in a bulk collect loop. Something like
 below and
 > then this code can be wrapped into a procedure and passed to the
 pgbench and
 > executed from there. Please correct me if I'm wrong.

 One important point to consider for benchmarks is that your benchmark
 has to be similar to the real application to be useful. If your real
 application runs on a different node and connects to the database over
 the network, a benchmark running within a stored procedure may not be
 very indicative of real performance.

 hp

 --
_  | Peter J. Holzer| Story must make more sense than reality.
 |_|_) ||
 | |   | h...@hjp.at |-- Charles Stross, "Creative writing
 __/   | http://www.hjp.at/ |   challenge!"

>>>


Re: Read write performance check

2023-12-19 Thread veem v
Thank you.

That would really be helpful if such test scripts or similar setups are
already available. Can someone please guide me to some docs or blogs or
sample scripts, on same please.

On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:

> As Rob mentioned, the syntax you posted is not correct. You need to
> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
> at one shot.
>
> But again your uses case seems common one considering you want to compare
> the read and write performance on multiple databases with similar table
> structure as per your usecase. So in that case, you may want to use some
> test scripts which others must have already done rather reinventing the
> wheel.
>
>
> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>
>> Thank you.
>>
>> Yes, actually we are trying to compare and see what maximum TPS are we
>> able to reach with both of these row by row and batch read/write test. And
>> then afterwards, this figure may be compared with other databases etc with
>> similar setups.
>>
>>  So wanted to understand from experts here, if this approach is fine? Or
>> some other approach is advisable?
>>
>> I agree to the point that , network will play a role in real world app,
>> but here, we are mainly wanted to see the database capability, as network
>> will always play a similar kind of role across all databases. Do you
>> suggest some other approach to achieve this objective?
>>
>>
>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>>
>>> On 2023-12-20 00:44:48 +0530, veem v wrote:
>>> >  So at first, we need to populate the base tables with the necessary
>>> data (say
>>> > 100million rows) with required skewness using random functions to
>>> generate the
>>> > variation in the values of different data types. Then in case of row
>>> by row
>>> > write/read test , we can traverse in a cursor loop. and in case of
>>> batch write/
>>> > insert , we need to traverse in a bulk collect loop. Something like
>>> below and
>>> > then this code can be wrapped into a procedure and passed to the
>>> pgbench and
>>> > executed from there. Please correct me if I'm wrong.
>>>
>>> One important point to consider for benchmarks is that your benchmark
>>> has to be similar to the real application to be useful. If your real
>>> application runs on a different node and connects to the database over
>>> the network, a benchmark running within a stored procedure may not be
>>> very indicative of real performance.
>>>
>>> hp
>>>
>>> --
>>>_  | Peter J. Holzer| Story must make more sense than reality.
>>> |_|_) ||
>>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>>> __/   | http://www.hjp.at/ |   challenge!"
>>>
>>


Re: Read write performance check

2023-12-19 Thread Lok P
As Rob mentioned, the syntax you posted is not correct. You need to process
or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one
shot.

But again your uses case seems common one considering you want to compare
the read and write performance on multiple databases with similar table
structure as per your usecase. So in that case, you may want to use some
test scripts which others must have already done rather reinventing the
wheel.


On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:

> Thank you.
>
> Yes, actually we are trying to compare and see what maximum TPS are we
> able to reach with both of these row by row and batch read/write test. And
> then afterwards, this figure may be compared with other databases etc with
> similar setups.
>
>  So wanted to understand from experts here, if this approach is fine? Or
> some other approach is advisable?
>
> I agree to the point that , network will play a role in real world app,
> but here, we are mainly wanted to see the database capability, as network
> will always play a similar kind of role across all databases. Do you
> suggest some other approach to achieve this objective?
>
>
> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>
>> On 2023-12-20 00:44:48 +0530, veem v wrote:
>> >  So at first, we need to populate the base tables with the necessary
>> data (say
>> > 100million rows) with required skewness using random functions to
>> generate the
>> > variation in the values of different data types. Then in case of row by
>> row
>> > write/read test , we can traverse in a cursor loop. and in case of
>> batch write/
>> > insert , we need to traverse in a bulk collect loop. Something like
>> below and
>> > then this code can be wrapped into a procedure and passed to the
>> pgbench and
>> > executed from there. Please correct me if I'm wrong.
>>
>> One important point to consider for benchmarks is that your benchmark
>> has to be similar to the real application to be useful. If your real
>> application runs on a different node and connects to the database over
>> the network, a benchmark running within a stored procedure may not be
>> very indicative of real performance.
>>
>> hp
>>
>> --
>>_  | Peter J. Holzer| Story must make more sense than reality.
>> |_|_) ||
>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>> __/   | http://www.hjp.at/ |   challenge!"
>>
>


Re: Read write performance check

2023-12-19 Thread veem v
Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we able
to reach with both of these row by row and batch read/write test. And then
afterwards, this figure may be compared with other databases etc with
similar setups.

 So wanted to understand from experts here, if this approach is fine? Or
some other approach is advisable?

I agree to the point that , network will play a role in real world app, but
here, we are mainly wanted to see the database capability, as network will
always play a similar kind of role across all databases. Do you suggest
some other approach to achieve this objective?


On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:

> On 2023-12-20 00:44:48 +0530, veem v wrote:
> >  So at first, we need to populate the base tables with the necessary
> data (say
> > 100million rows) with required skewness using random functions to
> generate the
> > variation in the values of different data types. Then in case of row by
> row
> > write/read test , we can traverse in a cursor loop. and in case of batch
> write/
> > insert , we need to traverse in a bulk collect loop. Something like
> below and
> > then this code can be wrapped into a procedure and passed to the pgbench
> and
> > executed from there. Please correct me if I'm wrong.
>
> One important point to consider for benchmarks is that your benchmark
> has to be similar to the real application to be useful. If your real
> application runs on a different node and connects to the database over
> the network, a benchmark running within a stored procedure may not be
> very indicative of real performance.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Read write performance check

2023-12-19 Thread Peter J. Holzer
On 2023-12-20 00:44:48 +0530, veem v wrote:
>  So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch 
> write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Read write performance check

2023-12-19 Thread Rob Sargent

On 12/19/23 12:14, veem v wrote:

Thank you for the confirmation.

 So at first, we need to populate the base tables with the necessary 
data (say 100million rows) with required skewness using random 
functions to generate the variation in the values of different data 
types. Then in case of row by row write/read test , we can traverse in 
a cursor loop. and in case of batch write/insert , we need to traverse 
in a bulk collect loop. Something like below and then this code can be 
wrapped into a procedure and passed to the pgbench and executed from 
there. Please correct me if I'm wrong.


Also can you please guide how the batch(say batch size of ~1000) 
Insert can be written ?


-- Row by row write
 FOR i IN 1..total_rows LOOP
        data_row := (SELECT
            md5(random()::TEXT),
            floor(random() * 100)::INT,
            random() * 1000::NUMERIC,
            NOW()::TIMESTAMP
        );
        INSERT INTO BASE_TABLE(column1, column2, column3, column4)
        VALUES (data_row.column1, data_row.column2, data_row.column3, 
data_row.column4);

    END LOOP;

--Row by row read
BEGIN
    FOR i IN 1..total_rows LOOP
        -- Row by row read
        SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
    END LOOP;
END;

This row by row is guaranteed to be slow if there's no index on the 100M 
rows

-- Batch read
BEGIN
    -- Batch read
    OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
    CLOSE data_set;
END;

Does this batch read in the entire 100M row table? And some suspicious 
syntax


PS: Notice that top posting is frowned upon on this list.





Re: Read write performance check

2023-12-19 Thread veem v
Thank you for the confirmation.

 So at first, we need to populate the base tables with the necessary data
(say 100million rows) with required skewness using random functions to
generate the variation in the values of different data types. Then in case
of row by row write/read test , we can traverse in a cursor loop. and in
case of batch write/insert , we need to traverse in a bulk collect loop.
Something like below and then this code can be wrapped into a procedure and
passed to the pgbench and executed from there. Please correct me if I'm
wrong.

Also can you please guide how the batch(say batch size of ~1000) Insert can
be written ?

-- Row by row write
 FOR i IN 1..total_rows LOOP
data_row := (SELECT
md5(random()::TEXT),
floor(random() * 100)::INT,
random() * 1000::NUMERIC,
NOW()::TIMESTAMP
);
INSERT INTO BASE_TABLE(column1, column2, column3, column4)
VALUES (data_row.column1, data_row.column2, data_row.column3,
data_row.column4);
END LOOP;

--Row by row read
BEGIN
FOR i IN 1..total_rows LOOP
-- Row by row read
SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
END LOOP;
END;

-- Batch read
BEGIN
-- Batch read
OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
CLOSE data_set;
END;






On Tue, 19 Dec 2023 at 12:58, Ilya Kosmodemiansky  wrote:

> Hi Veem,
>
> On Tue, Dec 19, 2023 at 7:36 AM veem v  wrote:
> > 1)For write performance , the rows needs to be inserted from multiple
> sessions at same time, with required random values as per the data types
> i.e. Character, Number, date columns. And this needs to be tested for row
> by row insert and batched insert.
> >
> > 2)For the read performance test , the table first has to be populated
> with those ~100million rows. Then querying will happen on that table row by
> row and batched way from the concurrent session.
> >
> > I am new to postgresql but mostly worked with Oracle, so I wanted to
> understand if the above can be achieved by creating a simple procedure or
> will a major effort be needed? And I saw a few blogs , pgbench to be used
> for concurrency tests.
>
> Yes, you are right, pgbench with customized script is what you are looking
> for
>
> >I want to know if this will still work on Aurora postgresql from intellij
> client worksheet.
>
> pgbench would work with aurora as with normal postgres, it is
> basically a shell script which connects to the database. Not sure if
> idea worksheet would help you in such case however, you can run it
> just from any machine with pgbench installed
>
>
> Best regards,
> Ilya
>
>
> --
> Ilya Kosmodemiansky
> CEO, Founder
>
> Data Egret GmbH
> Your remote PostgreSQL DBA team
> T.: +49 6821 919 3297
> i...@dataegret.com
>


Re: Read write performance check

2023-12-18 Thread Ilya Kosmodemiansky
Hi Veem,

On Tue, Dec 19, 2023 at 7:36 AM veem v  wrote:
> 1)For write performance , the rows needs to be inserted from multiple 
> sessions at same time, with required random values as per the data types i.e. 
> Character, Number, date columns. And this needs to be tested for row by row 
> insert and batched insert.
>
> 2)For the read performance test , the table first has to be populated with 
> those ~100million rows. Then querying will happen on that table row by row 
> and batched way from the concurrent session.
>
> I am new to postgresql but mostly worked with Oracle, so I wanted to 
> understand if the above can be achieved by creating a simple procedure or 
> will a major effort be needed? And I saw a few blogs , pgbench to be used for 
> concurrency tests.

Yes, you are right, pgbench with customized script is what you are looking for

>I want to know if this will still work on Aurora postgresql from intellij 
>client worksheet.

pgbench would work with aurora as with normal postgres, it is
basically a shell script which connects to the database. Not sure if
idea worksheet would help you in such case however, you can run it
just from any machine with pgbench installed


Best regards,
Ilya


-- 
Ilya Kosmodemiansky
CEO, Founder

Data Egret GmbH
Your remote PostgreSQL DBA team
T.: +49 6821 919 3297
i...@dataegret.com