Re: Identify root-cause for intermittent spikes

2022-10-12 Thread Sengottaiyan T
Thanks, Michael.

Due to reactive nature of the intermittent alerts, Is there any table which
stores the historical information / periodic snapshots captured from the
pg_stat_activity view?

On Tue, Oct 11, 2022 at 5:18 PM MichaelDBA  wrote:

> Hello,
>
> Your problem is probably, too many active, concurrent connections.  Get it
> from here the db directly:
> select datname, usename, application_name, substring(query, 1, 80) query
> from pg_stat_activity where state in ('active','idle in transaction');
>
> Compare the number of rows returned with the number of vCPUs.  If it's
> more than double the number of vCPUs in your AWS instance class, then you
> are cpu saturated.
>
> Regards,
>
> Michael Vitale
>
>
> Sengottaiyan T wrote on 10/11/2022 7:06 AM:
>
> Hi All,
>>
>> I'm looking for suggestions:
>>
>> Environment: AWS PostgreSQL RDS instance - Version 14.3
>> Operations support gets intermittent alerts from the monitoring tool
>> through AWS cloud watch metrics on Disk Queue Depth, CPU burst-credit & CPU
>> Utilization.
>> I would like to understand what is causing the spike - is the number of
>> logon's increased, (or) number of transactions per second increased, (or)
>> SQL execution picked wrong plan and the long running (I/O, CPU or memory
>> intensive) SQL is increasing load on server (cause and effect scenario)
>> etc.,
>>
>> Due to the reactive nature of the issues, we rely on the metrics gathered
>> in the AWS cloud watch monitoring (for the underlying OS stats),
>> Performance Insights (for the DB performance) and correlate SQL queries
>> with pg_Stat_Statements view. But the data in the view is an aggregated
>> stats. And, I'm looking to see the deltas compared to normal runs.
>> How should I approach and get to the root-cause?
>>
>> AppDynamics is already configured for the RDS instance. Are there any
>> open source monitoring tools available which would help to capture and
>> visualize the deltas?
>>
>> Thanks,
>> Senko
>>
>
>
>
>


Re: Identify root-cause for intermittent spikes

2022-10-12 Thread Sengottaiyan T
Thanks, Rick - I will give it a try.

On Wed, Oct 12, 2022 at 6:12 AM Rick Otten  wrote:

> I like to use pgbadger to collect data on what is happening in RDS
> instances.  You have to turn up a bunch of logging in RDS:
>
> 1. Turn on connection logging, duration logging, lock_waits, and anything
> else that you are interested in studying.
>
> 2. Then grab all of your postgresql logs from AWS.  I wrote this little
> bash script to pull all of the logs for a current day.  It will work if you
> have your aws credentials configured correctly and can run aws-cli commands.
> ```
> #!/bin/env bash
>
> ## Return all of the postgresql log files saved by RDS since midnight.
> ## Save them in your current directory.
> ## This is so we can use cli tools like "grep"
> ## It is also really handy for feeding into pgbadger for deeper analysis.
>
> # aws requires the timestamp to be in milliseconds.
> # unfortunately date will provide either seconds or nano seconds, so we
> have to do math.
> midnight_timestamp=$(date -d $(date -I) '+%s')
> midnight_timestamp_milliseconds=$(echo "${midnight_timestamp} * 1000" | bc)
>
> logfiles=$(aws rds describe-db-log-files \
>   --profile default \
>   --db-instance-identifier "*some_rds_instance_name*" \
>   --output json  \
>   --file-last-written ${midnight_timestamp_milliseconds} | jq
> -r ".DescribeDBLogFiles[].LogFileName")
>
> for logfile in $(echo ${logfiles})
> do
> # remove the leading "error/" so we can use the name to save it.
> logfile_save=$(echo "${logfile}" | awk -F\/ '{print $NF}')
>
> tput bold; echo "${logfile}"; tput sgr0
> aws rds download-db-log-file-portion \
> --profile admin \
> --db-instance-identifier prod-notify-me-1 \
> --log-file-name ${logfile} \
> --output text \
> --no-paginate > ${logfile_save}
> done
> ```
> 3. Then run pgbadger:
> ``` ~/src/pgbadger/pgbadger -f rds  postgresql*
> ```
> 4. Open the `out.html` in your browser, and poke around.   There is a ton
> of stuff you can find in all the drop down menus about what was happening
> in your database over the time window you collected the logs for.  The html
> is generated as a standalone file by a perl script of all things.  It is
> pretty impressive.
>
>
>
> On Tue, Oct 11, 2022 at 7:07 AM Sengottaiyan T 
> wrote:
>
>> Hi All,
>>>
>>> I'm looking for suggestions:
>>>
>>> Environment: AWS PostgreSQL RDS instance - Version 14.3
>>> Operations support gets intermittent alerts from the monitoring tool
>>> through AWS cloud watch metrics on Disk Queue Depth, CPU burst-credit & CPU
>>> Utilization.
>>> I would like to understand what is causing the spike - is the number of
>>> logon's increased, (or) number of transactions per second increased, (or)
>>> SQL execution picked wrong plan and the long running (I/O, CPU or memory
>>> intensive) SQL is increasing load on server (cause and effect scenario)
>>> etc.,
>>>
>>> Due to the reactive nature of the issues, we rely on the metrics
>>> gathered in the AWS cloud watch monitoring (for the underlying OS stats),
>>> Performance Insights (for the DB performance) and correlate SQL queries
>>> with pg_Stat_Statements view. But the data in the view is an aggregated
>>> stats. And, I'm looking to see the deltas compared to normal runs.
>>> How should I approach and get to the root-cause?
>>>
>>> AppDynamics is already configured for the RDS instance. Are there any
>>> open source monitoring tools available which would help to capture and
>>> visualize the deltas?
>>>
>>> Thanks,
>>> Senko
>>>
>>