[GENERAL] pg_ctl kill QUIT

2012-07-30 Thread Venkat Balaji
Hello Community,

We have used pg_ctl kill QUIT PID to terminate one of the processes on
the production database and the database went into recovery mode.

We understand that we should not use kill -9 and we did not do that.

Could you please help us avoid this problem permanently.

Regards,
Venkat

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.



[GENERAL] : Postgresql Error after recovery

2012-07-04 Thread Venkat Balaji
Hello Community,

We have a critical situation where-in our production database server got
effected by Root Kit.

When tried to build a replication site by copying the data directory to a
different server, so many files got missed while copying (this is due to
root kit effect).

So, we moved the missing files individually one-by-one and the situation
was a bit better.

We are struck with the following issue -

We are unable to connect to the databases, when we try to do so, we are
getting the below error -

postgres=# \c oltp_db
FATAL:  index pg_attribute_relid_attnum_index contains unexpected zero
page at block 0
HINT:  Please REINDEX it.
Previous connection kept

Can anyone please help us.

Regards,
VB

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.



Re: [GENERAL] : Postgresql Error after recovery

2012-07-04 Thread Venkat Balaji
On Wed, Jul 4, 2012 at 2:12 PM, Raghavendra 
raghavendra@enterprisedb.com wrote:

 On Wed, Jul 4, 2012 at 2:11 PM, Raghavendra 
 raghavendra@enterprisedb.com wrote:

 \
 postgres=# \c oltp_db
  FATAL:  index pg_attribute_relid_attnum_index contains unexpected
 zero page at block 0
 HINT:  Please REINDEX it.
 Previous connection kept

 Can anyone please help us.


 You have HINT given to REINDEX it. Use reindexdb from OS.



Re-index option did not work.

Single user mode option did not work as well -

[postgres@localhost bin]$ postgres --single oltp_db -E -D
/usr/local/postgresql-9.0.1/data
2012-07-04 04:30:47 CDT [26072]: [1-1] user=,db= FATAL:  index
pg_attribute_relid_attnum_index contains unexpected zero page at block 0
2012-07-04 04:30:47 CDT [26072]: [2-1] user=,db= HINT:  Please REINDEX it.

Last option would be dump and restore. We need to avoid that reduce the
downtime.

Regards,
Venkat

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.



Re: [GENERAL] : Postgresql Error after recovery

2012-07-04 Thread Venkat Balaji
I have restarted the cluster with ignore_system_indexes=true and was able
to connect to databases.

I have started re-indexing, seems to be working fine. Will get back if i
find further issues.

Regards,
Venkat

On Wed, Jul 4, 2012 at 3:35 PM, Raghavendra 
raghavendra@enterprisedb.com wrote:

 Re-index option did not work.


 Single user mode option did not work as well -

 [postgres@localhost bin]$ postgres --single oltp_db -E -D
 /usr/local/postgresql-9.0.1/data
 2012-07-04 04:30:47 CDT [26072]: [1-1] user=,db= FATAL:  index
 pg_attribute_relid_attnum_index contains unexpected zero page at block 0
 2012-07-04 04:30:47 CDT [26072]: [2-1] user=,db= HINT:  Please REINDEX it.

 Last option would be dump and restore. We need to avoid that reduce the
 downtime.


 How about starting the instance with ignore_system_indexes=true ? And do
 dump  restore. Am not sure on this procedure. Should someone here would
 have better idea on this.


 http://www.postgresql.org/docs/9.0/interactive/runtime-config-developer.html


-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.



Re: [GENERAL] Measuring replication lag time

2012-02-22 Thread Venkat Balaji
On Wed, Feb 22, 2012 at 5:40 PM, Stuart Bishop stu...@stuartbishop.netwrote:

Hi.

 I need to measure how far in the past a hot standby is, async
 streaming replication.

 On the Hot Standby, select
 age(current_timestamp,pg_last_xact_replay_timestamp()) gets me this
 (or close enough for my purposes - I understand that if there are no
 updates, there are no logs to replay and the lag time will increase).

 Is there some way to get this same information on the master?
 pg_stat_replication contains the log information, but I can't see how
 to map this to a timestamp.

 Is there a better way of measuring this?


Comparing pg_controldata output on prod and standby might help you with
this.

Thanks,
VB


Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-19 Thread Venkat Balaji
On Thu, Feb 16, 2012 at 8:14 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On Wednesday, February 15, 2012 10:21:02 pm Venkat Balaji wrote:
  Andrian,
 
  Thanks a lot !
 
  So in this case you are not waiting for confirmation of the commit being
 
   flushed
   to disk on the standby.  It that case you are bypassing the primary
   reason for
   sync replication. The plus is transactions on the master will complete
   faster
   and do so in the absence of the standby. The minus is that you are in
   sort of an
   in between state.
 
  I understand. My worry and requirement is to ensure master is not
 disturbed
  for any reason.
  In sync rep, the biggest worry is if standby server is unavailable and is
  down for longer time, master hangs and will be in the same state until
  standby comes back up or replication must be broken temporarily (until
  standby comes back up) so that master runs without interruption. This is
 a
  costly exercise on production from downtime perspective.

 So just use regular streaming replication without sync rep. You get record
 based
 transaction shipping without having to wait for the standby.  You will
 need to
 make sure that wal_keep_segments is big enough to cover any down time on
 the
 standby(you would need that for sync rep also).


As we already have streaming replication configured. We have rolled back
the plan of setting up synchronous replication.

Thanks,
VB


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
 
  On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt jay.lev...@gmail.com
 wrote:
 
  We need to do a few bulk updates as Rails migrations.  We're a typical
  read-mostly web site, so at the moment, our checkpoint settings and WAL
 are
  all default (3 segments, 5 min, 16MB), and updating a million rows
 takes 10
  minutes due to all the checkpointing.
 
  We have no replication or hot standbys.  As a consumer-web startup, with
  no SLA, and not a huge database, and if we ever do have to recover from
  downtime it's ok if it takes longer.. is there a reason NOT to always
 run
  with something like checkpoint_segments = 1000, as long as I leave the
  timeout at 5m?
 
 
  Still checkpoints keep occurring every 5 mins. Anyways
  checkpoint_segments=1000 is huge, this implies you are talking about
  16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
  perspective and data loss perspective. Even in the most unimaginable
 case if
  all of these 1000 files get filled up in less than 5 mins, there are
 chances
  that system will slow down due to high IO and CPU.



 As far as I know there is no data loss issue with a lot of checkpoint
 segments.


Data loss would be an issue when there is a server crash or pg_xlog crash
etc. That many number of pg_xlog files (1000) would contribute to huge data
loss (data changes not synced to the base are not guaranteed). Of-course,
this is not related to the current situation.  Normally we calculate the
checkpoint completion time, IO pressure, CPU load and the threat to the
data loss when we configure checkpoint_segments.


Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-15 Thread Venkat Balaji
In-short, I would like to understand if i am achieving the same
asynchronous streaming replication by putting synchronous_commit='local' -

I understand that streaming replication is record based log-shipping.

Below is what shows up on our primary test server where we are testing
synchronous replication -

*1. Synchronous setup enabled with synchronous_commit='local'*

postgres=# select * from pg_stat_replication ;

procpid | usesysid | usename  | application_name | client_addr |
client_hostname | client_port |backend_start |   state   |
sent_location | write_locat
ion | flush_location | replay_location | sync_priority | sync_state
-+--+--+--+-+-+-+--+---+---+
++-+---+
   24099 |   10 | postgres | walreceiver  |  ip-address  |
  |   56432 | 2012-02-15 12:55:39.65663+03 | streaming |
0/E78 | 0/E78
| 0/E78  | 0/E78   | 1 | *sync*
(1 row)

postgres=# show synchronous_commit ;

synchronous_commit

* local*
(1 row)

postgres=# show synchronous_standby_names ;

synchronous_standby_names
-
 *
(1 row)

Does this mean that the system is still replicating synchronously ? If yes,
by what means ?

*Below is our actual production setup in 9.1.1 with asynchronous
replication setup -*

*2. Asynchronous enabled with synchronous_commit='on'*

psql (9.1.1)
Type help for help.

postgres=# select * from pg_stat_replication;

procpid | usesysid | usename  | application_name | client_addr |
client_hostname| client_port | backend_start |   state
  | sent_location | write
_location | flush_location | replay_location | sync_priority | sync_state
-+--+--+--+-+--+-+---+---+---+--
--++-+---+
3159 |   10 | postgres | walreceiver  | ipaddress |
hostname |   40165 | 2012-02-08 12:41:51.858897+03 | streaming |
1/86F83B50| 1/86F
83B50 | 1/86F83B50 | 1/86F83B50  | 0 | *async*

(1 row)

postgres=# show synchronous_commit ;
 synchronous_commit

 on
(1 row)

postgres=# show synchronous_standby_names ;

synchronous_standby_names
---

(1 row)

Operation wise, I am not seeing much difference by inserting few 1000 rows.
Its almost the same behavior both in asynch and sync rep.

Thanks,
VB


On Wed, Feb 15, 2012 at 11:11 AM, Venkat Balaji venkat.bal...@verse.inwrote:


 On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji venkat.bal...@verse.inwrote:


 On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver 
 adrian.kla...@gmail.comwrote:

 On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
  Hello,
 
  Disaster Recovery testing for Synchronous replication setup -
 
  When the standby site is down, transactions at the production site
 started
  hanging (this is after the successful setup of synchronous
 replication).
 
  We changed synchronous_commit to 'local' to over-come this situation.
 
   - No transactions are hanging at the production site even when the
 standby
  is down
   - Standby is automatically getting synced when it is back up again.
 
  Can someone let us know if there are any -ve effects of putting
  synchronous_commit='local' ??
 
  I am assuming that this as good as putting synchronous_commit=on on
 an
  stand-alone system.

 It would seem you are really after streaming replication(which is
 asynchronous)
 more than synchronous replication. I have not used synchronous
 replication
 enough to be  sure, but I think by setting synchronous_commit='local'
 you are
 basically turning the system into a straight streaming(asynchronous)
 system
 anyway.


 Sorry. Ignore my earlier message -

 Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based
 ). All i wanted to achieve is as follows -

 1. Synchronous replication - which would perform transactions
 simultaneously on production and standby.
 2. Ideally, if the commit does not occur at the standby site, then it
 would not commit at the production as well, which will cause production
 site
 to hang. I do not want production site to hang if the standby site is
 down or not accessible.
 3. I would need the commit to occur on production and the production apps
 should not be disturbed if the standby fails to respond. To achieve this,
 I have set synchronous_commit='local' to ensure that transactions are
 committed at production site first.

 We do have streaming replication (of PG-9.0) setup on our other production
 boxes, which is asynchronous and is WAL based.

 Thanks
 VB



Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji
  Data loss would be an issue when there is a server crash or pg_xlog crash
  etc. That many number of pg_xlog files (1000) would contribute to huge
  data
  loss (data changes not synced to the base are not guaranteed). Of-course,
  this is not related to the current situation.  Normally we calculate the
  checkpoint completion time, IO pressure, CPU load and the threat to the
  data loss when we configure checkpoint_segments.

 So you're saying that by using small number of checkpoint segments you
 limit the data loss when the WAL gets corrupted/lost? That's a bit like
 buying a Maseratti and then not going faster than 10mph because you might
 crash at higher speeds ...


No. I am not saying that checkpoint_segments must be lower. I was just
trying to explain the IO over-head on putting high (as high as 1000)
checkpoint segments.  Lower number of checkpoint segments will lead to more
frequent IOs which is not good. Agreed.


 The problem here is that the WAL is usually placed on more reliable drives
 (compared to the data files) or a RAID1 array and as it's just writing
 data sequentially, so the usage pattern is much less likely to cause
 data/drive corruption (compared to data files that need to handle a lot of
 random I/O, etc.).


Agreed.


 So while it possible the WAL might get corrupted, the probability of data
 file corruption is much higher. And the corruption might easily happen
 silently during a checkpoint, so there won't be any WAL segments no matter
 how many of them you keep ...


Agreed. When corruption occurs, it really does not matter how many WAL
segments are kept in pg_xlog.
But, at any point of time if PG needs


 And by using low number of checkpoint segments it actually gets worse,
 because it means more frequent checkpoints - more I/O on the drives -
 more wearout of the drives etc.


Completely agreed. As mentioned above. I choose checkpoint_segments and
checkpoint_timeout once i observe the checkpoint behavior.

If you need to protect yourself against this, you need to keep a WAL
 archive (prefferably on a separate machine) and/or a hot standby for
 failover.


WAL archiving is a different situation where-in you need to backup the
pg_xlog files by enabling archiving.
I was referring to an exclusive situation, where-in pg_xlogs are not
archived and data is not yet been synced to base files (by bgwriter) and
the system crashed, then PG would depend on pg_xlog to recover and reach
the consistent state, if the pg_xlog is also not available, then there
would be a data loss and this depends on how much data is present in
pg_xlog files.

Thanks,
VB


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 4:12 PM, Andres Freund and...@anarazel.de wrote:

 On Wednesday, February 15, 2012 10:38:23 AM Venkat Balaji wrote:
  On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe
 scott.marl...@gmail.comwrote:
   On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji 
 venkat.bal...@verse.in
all of these 1000 files get filled up in less than 5 mins, there are
chances that system will slow down due to high IO and CPU.
   As far as I know there is no data loss issue with a lot of checkpoint
   segments.
  Data loss would be an issue when there is a server crash or pg_xlog crash
  etc. That many number of pg_xlog files (1000) would contribute to huge
 data
  loss (data changes not synced to the base are not guaranteed). Of-course,
  this is not related to the current situation.  Normally we calculate the
  checkpoint completion time, IO pressure, CPU load and the threat to the
  data loss when we configure checkpoint_segments.
 I think you might be misunderstanding something. A high number of
 checkpoint_segments can lead to slower recovery - all those changes need
 to be
 reapplied - but it won't lead to lost data. The data inside the wal will be
 fsynced at appropriate times (commit; background writer; too much written).


Recovery would take time because all the changes in WAL files of pg_xlog
(which is high) must be replayed to reach consistent state. When disaster
strikes and if pg_xlogs are not available and data in WAL is not fsynced
yet, then recovery is not possible and data loss will be huge. It also
depends on how much data is not fsynced.

Thanks,
VB


Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-15 Thread Venkat Balaji
Andrian,

Thanks a lot !

So in this case you are not waiting for confirmation of the commit being
 flushed
 to disk on the standby.  It that case you are bypassing the primary reason
 for
 sync replication. The plus is transactions on the master will complete
 faster
 and do so in the absence of the standby. The minus is that you are in sort
 of an
 in between state.


I understand. My worry and requirement is to ensure master is not disturbed
for any reason.
In sync rep, the biggest worry is if standby server is unavailable and is
down for longer time, master hangs and will be in the same state until
standby comes back up or replication must be broken temporarily (until
standby comes back up) so that master runs without interruption. This is a
costly exercise on production from downtime perspective.

Personally, I take sync replication to be basically an all or nothing
 proposition. By setting it up you are saying you want, at minimum, two
 database
 clusters to be in sync at any point in time all the time (except for start
 up).
 If that is not possible then you are really looking for async replication.


Yeah. We will need to make a decision accordingly.

Thanks again,
VB


Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-14 Thread Venkat Balaji
Hello,

Disaster Recovery testing for Synchronous replication setup -

When the standby site is down, transactions at the production site started
hanging (this is after the successful setup of synchronous replication).

We changed synchronous_commit to 'local' to over-come this situation.

 - No transactions are hanging at the production site even when the standby
is down
 - Standby is automatically getting synced when it is back up again.

Can someone let us know if there are any -ve effects of putting
synchronous_commit='local' ??

I am assuming that this as good as putting synchronous_commit=on on an
stand-alone system.

We need to get this setup live on production shortly.

Thanks
VB

On Fri, Feb 10, 2012 at 4:47 PM, Venkat Balaji venkat.bal...@verse.inwrote:


 This issue stays resolved !!!

 The statements are no more hanging on production now :)

 The suspected problem was -

 Our brand new production server did not have the port 5432 open.

 I had opened the port using iptables command and everything started
 working.

 synchronous replication is fast and awesome.

 Thanks
 VB


 On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:

 
  Connection is working fine between primary and standby, ping is working
  fine and wal archive file transfer is working without any issues.
 
  I tried CREATE TABLE and CREATE DATABASE, both were hanging.
 
  Apart from regular streaming replication settings, I did the following
 on
  primary to enable synchronous replication -
 
  synchronous_standby_names='*'
 
  Commands started hanging after that. Is there anything else i need to
 do.

 From here:

 http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html

 
 synchronous_standby_names (string)
 ... The synchronous standby will be the first standby named in this list
 that is
 both currently connected and streaming data in real-time (as shown by a
 state of
 streaming in the pg_stat_replication view). Other standby servers
 appearing
 later in this list represent potential synchronous standbys

 The name of a standby server for this purpose is the application_name
 setting of
 the standby, as set in the primary_conninfo of the standby's walreceiver.
 There
 is no mechanism to enforce uniqueness. In case of duplicates one of the
 matching
 standbys will be chosen to be the synchronous standby, though exactly
 which one
 is indeterminate. The special entry * matches any application_name,
 including
 the default application name of walreceiver.

 

 So I would check the pg_stat_replication view to see if Postgres is
 seeing the
 standby as streaming.


 
  Thanks
  VB

 --
 Adrian Klaver
 adrian.kla...@gmail.com





Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-14 Thread Venkat Balaji
On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
  Hello,
 
  Disaster Recovery testing for Synchronous replication setup -
 
  When the standby site is down, transactions at the production site
 started
  hanging (this is after the successful setup of synchronous replication).
 
  We changed synchronous_commit to 'local' to over-come this situation.
 
   - No transactions are hanging at the production site even when the
 standby
  is down
   - Standby is automatically getting synced when it is back up again.
 
  Can someone let us know if there are any -ve effects of putting
  synchronous_commit='local' ??
 
  I am assuming that this as good as putting synchronous_commit=on on an
  stand-alone system.

 It would seem you are really after streaming replication(which is
 asynchronous)
 more than synchronous replication. I have not used synchronous replication
 enough to be  sure, but I think by setting synchronous_commit='local' you
 are
 basically turning the system into a straight streaming(asynchronous) system
 anyway.


Yeah. Its a kind of asynchronous. All i wanted is as follows -

1


 
  We need to get this setup live on production shortly.
 
  Thanks
  VB

 --
 Adrian Klaver
 adrian.kla...@gmail.com



Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-14 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji venkat.bal...@verse.inwrote:


 On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
  Hello,
 
  Disaster Recovery testing for Synchronous replication setup -
 
  When the standby site is down, transactions at the production site
 started
  hanging (this is after the successful setup of synchronous replication).
 
  We changed synchronous_commit to 'local' to over-come this situation.
 
   - No transactions are hanging at the production site even when the
 standby
  is down
   - Standby is automatically getting synced when it is back up again.
 
  Can someone let us know if there are any -ve effects of putting
  synchronous_commit='local' ??
 
  I am assuming that this as good as putting synchronous_commit=on on an
  stand-alone system.

 It would seem you are really after streaming replication(which is
 asynchronous)
 more than synchronous replication. I have not used synchronous replication
 enough to be  sure, but I think by setting synchronous_commit='local' you
 are
 basically turning the system into a straight streaming(asynchronous)
 system
 anyway.


 Sorry. Ignore my earlier message -

Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based
). All i wanted to achieve is as follows -

1. Synchronous replication - which would perform transactions
simultaneously on production and standby.
2. Ideally, if the commit does not occur at the standby site, then it would
not commit at the production as well, which will cause production site
to hang. I do not want production site to hang if the standby site is
down or not accessible.
3. I would need the commit to occur on production and the production apps
should not be disturbed if the standby fails to respond. To achieve this,
I have set synchronous_commit='local' to ensure that transactions are
committed at production site first.

We do have streaming replication (of PG-9.0) setup on our other production
boxes, which is asynchronous and is WAL based.

Thanks
VB


Re: [GENERAL] High checkpoint_segments

2012-02-14 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt jay.lev...@gmail.com wrote:

 We need to do a few bulk updates as Rails migrations.  We're a typical
 read-mostly web site, so at the moment, our checkpoint settings and WAL are
 all default (3 segments, 5 min, 16MB), and updating a million rows takes 10
 minutes due to all the checkpointing.

 We have no replication or hot standbys.  As a consumer-web startup, with
 no SLA, and not a huge database, and if we ever do have to recover from
 downtime it's ok if it takes longer.. is there a reason NOT to always run
 with something like checkpoint_segments = 1000, as long as I leave the
 timeout at 5m?


Still checkpoints keep occurring every 5 mins. Anyways
checkpoint_segments=1000 is huge, this implies you are talking about
16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
perspective and data loss perspective. Even in the most unimaginable case
if all of these 1000 files get filled up in less than 5 mins, there are
chances that system will slow down due to high IO and CPU.

You may think of increasing checkpoint_timeout as well, but, some
monitoring and analysis is needed to arrive at a number.

What does pg_stat_bgwriter say about checkpoints ?
Do you have log_checkpoints enabled ?

Thanks
VB


Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-10 Thread Venkat Balaji
This issue stays resolved !!!

The statements are no more hanging on production now :)

The suspected problem was -

Our brand new production server did not have the port 5432 open.

I had opened the port using iptables command and everything started
working.

synchronous replication is fast and awesome.

Thanks
VB

On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:

 
  Connection is working fine between primary and standby, ping is working
  fine and wal archive file transfer is working without any issues.
 
  I tried CREATE TABLE and CREATE DATABASE, both were hanging.
 
  Apart from regular streaming replication settings, I did the following on
  primary to enable synchronous replication -
 
  synchronous_standby_names='*'
 
  Commands started hanging after that. Is there anything else i need to do.

 From here:

 http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html

 
 synchronous_standby_names (string)
 ... The synchronous standby will be the first standby named in this list
 that is
 both currently connected and streaming data in real-time (as shown by a
 state of
 streaming in the pg_stat_replication view). Other standby servers appearing
 later in this list represent potential synchronous standbys

 The name of a standby server for this purpose is the application_name
 setting of
 the standby, as set in the primary_conninfo of the standby's walreceiver.
 There
 is no mechanism to enforce uniqueness. In case of duplicates one of the
 matching
 standbys will be chosen to be the synchronous standby, though exactly
 which one
 is indeterminate. The special entry * matches any application_name,
 including
 the default application name of walreceiver.

 

 So I would check the pg_stat_replication view to see if Postgres is seeing
 the
 standby as streaming.


 
  Thanks
  VB

 --
 Adrian Klaver
 adrian.kla...@gmail.com



Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-02 Thread Venkat Balaji
On Thu, Feb 2, 2012 at 8:37 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On Wednesday, February 01, 2012 10:51:44 pm Venkat Balaji wrote:
  Hello,
 
  I was testing the Postgres-9.1.1 synchronous streaming replication on our
  UAT system.
 
  Without synchronous replication, everything was working fine.
 
  But, when i enabled synchronous_replication_names='*', the create table
  started hanging for long time.

 Only the CREATE TABLE statement or all statements?
 In general terms synchronous replication moves at the speed of the
 connection
 between the primary and standby or does not occur if the standby can not be
 found. So what is the state of the connection between the primary and
 standby?


Connection is working fine between primary and standby, ping is working
fine and wal archive file transfer is working without any issues.

I tried CREATE TABLE and CREATE DATABASE, both were hanging.

Apart from regular streaming replication settings, I did the following on
primary to enable synchronous replication -

synchronous_standby_names='*'

Commands started hanging after that. Is there anything else i need to do.

Thanks
VB


[GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-01 Thread Venkat Balaji
Hello,

I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.

Without synchronous replication, everything was working fine.

But, when i enabled synchronous_replication_names='*', the create table
started hanging for long time.

When i pressed Ctrl+C i got the following message -

Cancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have
been replicated to the standby.
CREATE TABLE

Can someone please help us ?

Thanks
VB


Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-01 Thread Venkat Balaji
synchronous_commit is on

Thanks
VB

On Thu, Feb 2, 2012 at 12:31 PM, Raghavendra 
raghavendra@enterprisedb.com wrote:

 What is the value of synchronous_commit ?

 ---
 Regards,
 Raghavendra
 EnterpriseDB Corporation
 Blog: http://raghavt.blogspot.com/



 On Thu, Feb 2, 2012 at 12:21 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello,

 I was testing the Postgres-9.1.1 synchronous streaming replication on our
 UAT system.

 Without synchronous replication, everything was working fine.

 But, when i enabled synchronous_replication_names='*', the create table
 started hanging for long time.

 When i pressed Ctrl+C i got the following message -

 Cancel request sent
 WARNING:  canceling wait for synchronous replication due to user request
 DETAIL:  The transaction has already committed locally, but might not
 have been replicated to the standby.
 CREATE TABLE

 Can someone please help us ?

 Thanks
 VB





[GENERAL]: streaming replication on PG-9.1.1

2011-12-13 Thread Venkat Balaji
Hello,

We have configured streaming replication (not synchronous) for our
production in PG-9.1.1.

Replication is working fine, we can see the transactions getting replicated
without any issues.

I see the below problem -

pg_stat_replication on master shows no rows all the time..

ps -Af | grep sender | grep -v grep  -- has never shown sender process on
master
ps -Af | grep receiver | grep -v grep -- has never shown receiver process
on slave

Note : We do not have active transactions all the time on master.

Please help !

Thanks
VB


Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-01 Thread Venkat Balaji
2011/12/2 Oleg Serov sero...@gmail.com

 And, i'm an idiot.

 My DB version:
 PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit



 2011/12/2 Oleg Serov sero...@gmail.com

 Hello, i have a problem.

 I've got a production server, working fine. Then i've got strange error:
  ERROR:  right sibling's left-link doesn't match: block 147 links to 407
 instead of expected 146 in index order_status_key'
 And decidet to backup all server. So i shut-down VPS with server and
 backup all data.
 Then, after i booted it - and then - i've got Data loss.


This seems to be an Index corruption. Did you try re-indexing ? Index
creation might have failed, re-indexing would re-organize the Index tuples.
If you are sure about disk corruption, try and re-create or create
concurrent Index on a different disk.


 I've lost data, that have been written to DB around 10-100 hours
 (different tables, have different last updated value).

 Then i've analyzed log, and found this:
 7 days ago appears this errors:
 db= LOG:  could not rename temporary statistics file
 pg_stat_tmp/pgstat.tmp to pg_stat_tmp/pgstat.stat:
 db= WARNING:  pgstat wait timeout
  ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076


This should be a free space issue, do you have enough space in
pg_stat_tmp disk ?

5 days ago:
 a lot of: ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/526512E0
   83238 db= WARNING:  could not write block 54 of base/16384/2619
   83239 db= CONTEXT:  writing block 54 of relation base/16384/2619

 And today:
  18 db= LOG:  could not open file pg_xlog/0001000F0052
 (log file 15, segment 82):
  19 db= ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/52FDF0E0


 There is any ability to recover fresh data from database?


What kind of backups you have available ?


Thanks
VB


[GENERAL] : pg_compresslog (pglesslog)

2011-11-30 Thread Venkat Balaji
Hello Everyone,

Can someone please help me know if there exists a pglesslog version for
PG-9.0.

I only see beta version (1.4.2) for pg9.0 being released sometime ago.

Anyone using pg_lesslog_1.4.2_pg90_beta.tar for PG-9.0 production
successfully ?

Can we use the above said version on production ?

Please help !

Thanks
VB


Re: [GENERAL] How to restore the table space tar files created by pg_basebackup?

2011-11-30 Thread Venkat Balaji
Do you have Tablespace directories with a softlink to the data directory ?

Thanks
VB

On Wed, Nov 30, 2011 at 7:42 PM, Samba saas...@gmail.com wrote:

 Hi all,

 I have taken a base backup of my master server using pg_basebackup command
 as below:
  pg_basebackup -D /tmp/PostgresBackup/ -Ft -Z 9 -l masterbackup -h
 localhost -U replication -w

 The above created 4 tar files, namely: 16394.tar.gz  16395.tar.gz
  16396.tar.gz  base.tar.gz

 I do know that my database contains 3 table spaces in addition to
 pg_default and pg_global ( I guess, that is why it created those three
 numbered tar.gz files, plus one base.tar.gz file ) and my master and
 standby servers are identical by all means.

 Now, I'm not sure how can I restore these files on the standby server. I
 could restore the base.tar.gz into the data directory on standby and the
 streaming replication has started working properly. But I'm not sure what
 to do with these additional numbered gz files which contains the same data
 that is already contained in the base.tar.gz file.

 Can some one explain me what to do with these files? The documentation for
 pg_basebackup does not mention this information, it just says that a
 different variant of the command will fail if there are multiple table
 spaces.

 Another related query is if  we can specify the name of the backup file
 instead of leaving it to be base.tar.gz file.

 Thanks and Regards,
 Samba







Re: [GENERAL] Incremental backup with RSYNC or something?

2011-11-17 Thread Venkat Balaji
On Fri, Nov 18, 2011 at 6:08 AM, Phoenix Kiula phoenix.ki...@gmail.comwrote:

 On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  Question: what can I do to rsync only the new additions in every table
  starting 00:00:01 until 23:59:59 for each day?
 
  A table level replication (like Slony) should help here.


 Slony needs more than one physical server, right?


Not necessarily, you can replicate with-in the same host as well. It all
depends on which tables you want to replicate.
It does not depend on number of hosts.

Thanks
VB


[GENERAL] : Postgres installation error on CentOS

2011-11-15 Thread Venkat Balaji
Hello,

We are facing an issue while installing Postgres-9.0.1 on CentOS-5.

Below is the error we are encountering -

./configure -- output

checking for inflate in -lz... no
configure: error: zlib library not found
If you have zlib already installed, see config.log for details on the
failure.  It is possible the compiler isn't looking in the proper directory.
Use --without-zlib to disable zlib support.

Inside the config.log, below is what we see -

configure:8204: checking for inflate in -lz
configure:8239: gcc -o conftest -O2 -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing -fwrapv  -D_GNU_SOURCEconftest.c -lz  -lreadline
-ltermcap -lcrypt -ldl -lm  5
/usr/bin/ld: skipping incompatible /usr/lib/libz.a when searching for -lz
/usr/bin/ld: cannot find -lz

Zlib rpms are installed and below is the rpm -qa output -

[root@localhost postgresql-9.0.1]# rpm -qa | grep zlib
zlib-1.2.3-4.el5
jzlib-1.0.7-4jpp.1
zlib-devel-1.2.3-4.el5
zlib-1.2.3-3

Thanks
VB


Re: [GENERAL] : Postgres installation error on CentOS

2011-11-15 Thread Venkat Balaji
Hi Alban,

Thanks for the reply !

I was able to resolve this issue, but, not by removing the older version of
zlib (i was unable to do so due to dependencies).

I did not have older version of zlib-devel installed, I did that and able
to install.

Regards,
VB

On Tue, Nov 15, 2011 at 5:56 PM, Alban Hertroys haram...@gmail.com wrote:

 On 15 November 2011 12:58, Venkat Balaji venkat.bal...@verse.in wrote:
  Hello,
  We are facing an issue while installing Postgres-9.0.1 on CentOS-5.

 That name always makes me wonder when they're releasing PennyOS or
 DollarOS :P

  Below is the error we are encountering -
  ./configure -- output
  checking for inflate in -lz... no
  configure: error: zlib library not found
 ...
  /usr/bin/ld: skipping incompatible /usr/lib/libz.a when searching for -lz

 Apparently your installed libz doesn't provide a function that
 configure is checking for. Perhaps upgrading it helps, if possible?

 Another possibility is that configure gets pointed to an old version
 of zlib as the first result from LD. You could try removing that, but
 you probably have dependencies on it from other packages.

  Zlib rpms are installed and below is the rpm -qa output -
  [root@localhost postgresql-9.0.1]# rpm -qa | grep zlib
  zlib-1.2.3-4.el5
  jzlib-1.0.7-4jpp.1
  zlib-devel-1.2.3-4.el5
  zlib-1.2.3-3

 As a non-linux user this doesn't mean much to me.
 --
 If you can't see the forest for the trees,
 Cut the trees and you'll see there is no forest.



Re: [GENERAL] Incremental backup with RSYNC or something?

2011-11-13 Thread Venkat Balaji

 Question: what can I do to rsync only the new additions in every table
 starting 00:00:01 until 23:59:59 for each day?


A table level replication (like Slony) should help here.

Or

A trigger based approach with dblink would be an-other (but, a bit complex)
option.

Thanks
VB


[GENERAL] : postgres: archiver process failed on 0000000100000F72000000F0

2011-11-11 Thread Venkat Balaji
Hello,

WAL Archive process in our production is not working.

[postgres@hostname]$ ps -ef | grep archive
postgres 12077 16015  0 10:19 pts/400:00:00 grep archive
postgres 31126 27607  0 Nov10 ?00:01:18 postgres: archiver process
  failed on 00010F7200F0

I see WAL files getting accumulated in pg_xlog location and the status in
archive_status is shown as .ready.

Is there anyway we can only restart archiving process without disturbing
the actual cluster ?

Actually, we had killed a process using kill  -9  and the db went into
recovery mode and was back up and running.

We have no issues with the application as well.

postgres=# select pg_is_in_recovery();

 pg_is_in_recovery
---
 f
(1 row)

Please help to resolve this !

Thanks
VB


Re: [GENERAL] : postgres: archiver process failed on 0000000100000F72000000F0

2011-11-11 Thread Venkat Balaji
This problem has been resolved !!

Thanks
VB

On Fri, Nov 11, 2011 at 9:58 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello,

 WAL Archive process in our production is not working.

 [postgres@hostname]$ ps -ef | grep archive
 postgres 12077 16015  0 10:19 pts/400:00:00 grep archive
 postgres 31126 27607  0 Nov10 ?00:01:18 postgres: archiver process
   failed on 00010F7200F0

 I see WAL files getting accumulated in pg_xlog location and the status in
 archive_status is shown as .ready.

 Is there anyway we can only restart archiving process without disturbing
 the actual cluster ?

 Actually, we had killed a process using kill  -9  and the db went into
 recovery mode and was back up and running.

 We have no issues with the application as well.

 postgres=# select pg_is_in_recovery();

  pg_is_in_recovery
 ---
  f
 (1 row)

 Please help to resolve this !

 Thanks
 VB



[GENERAL] : failed: ERROR: could not open file base/44620/972355: No such file or directory

2011-11-03 Thread Venkat Balaji
Hello Everyone,

We had recently taken an online backup of our production database cluster
(pg_start_backup() - rsync - pg_stop_backup()).

We had built the testing cluster with the backup.

When we try to  vacuum the database or vacuum full the testing database, we
are getting the following error.

 vacuuming of database failed: ERROR:  could not open file
base/44620/972355: No such file or directory

In an other situation-

postgres=# select * from table1 limit 10;
ERROR:  could not open file base/44620/1022275: No such file or directory

After going through the below thread, one situation which we believe could
be the problem is -

- Table1 is a partitioned table and we keep dropping and adding partitions
on daily basis. This might have happened at the time we were backing up.

Is this the problem ? If yes, what precautions we need to take ? perform
the backup when there are no DDLs are happening ?

Both the base files 972355 and 1022275 are not existing in production
(probably they belong to dropped partitions --  not sure).

We have a data center migration coming up, so, this is very critical for us
-

PG Version - 9.0.1
OS - RHEL 5.5 (prod) and RHEL 6.1 (test)

Please help

Thanks
VB


Re: [GENERAL] : failed: ERROR: could not open file base/44620/972355: No such file or directory

2011-11-03 Thread Venkat Balaji
Sorry forgot to mention the thread I referred to -

http://archives.postgresql.org/pgsql-general/2010-12/msg01000.php

Thanks
VB

On Thu, Nov 3, 2011 at 3:48 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello Everyone,

 We had recently taken an online backup of our production database cluster
 (pg_start_backup() - rsync - pg_stop_backup()).

 We had built the testing cluster with the backup.

 When we try to  vacuum the database or vacuum full the testing database,
 we are getting the following error.

  vacuuming of database failed: ERROR:  could not open file
 base/44620/972355: No such file or directory

 In an other situation-

 postgres=# select * from table1 limit 10;
 ERROR:  could not open file base/44620/1022275: No such file or directory

 After going through the below thread, one situation which we believe could
 be the problem is -

 - Table1 is a partitioned table and we keep dropping and adding partitions
 on daily basis. This might have happened at the time we were backing up.

 Is this the problem ? If yes, what precautions we need to take ? perform
 the backup when there are no DDLs are happening ?

 Both the base files 972355 and 1022275 are not existing in production
 (probably they belong to dropped partitions --  not sure).

 We have a data center migration coming up, so, this is very critical for
 us -

 PG Version - 9.0.1
 OS - RHEL 5.5 (prod) and RHEL 6.1 (test)

 Please help

 Thanks
 VB



Re: [GENERAL] Server move using rsync

2011-11-02 Thread Venkat Balaji

 We're not doing this long-term, in order to have a backup server we can
 fail-over to, but rather as a one-off low impact move of our database.
 Consequently, instead of using pg_start_backup and pg_stop_backup, and
 keeping all WAL, we're stopping the database, rsync of everything, and
 starting the database in the new server, with it appearing to the new
 server (if it was capable of noticing such things) that it had simply been
 shutdown and restarted.


This is fine. If the database is shutdown, then the backup is completely
safe. You can bring up the cluster as on backup time without any issues.


 The initial and repeated rsyncs while the first server is running and in
 use, are solely in order to reduce the time that the rsync takes while the
 postgresql application is stopped.



 Do you still think we need to do anything special with pg_start_backup,
 pg_stop_backup, and WAL archives?


Yes, after the initial sync, if the next repeated rsyncs are performed
while the database cluster is up and running, then
pg_start_backup()-rsync-pg_stop_backup() (as said earlier) must be
performed. This will help Postgres know that the backup is going on. When
you do pg_start_backup(), Postgres will make note and updates all the base
file headers and makes a note of the TXN ids and Checkpoint time by
creating a label. So, the WAL archives at time are needed for recovery (to
recover any half written transactions).

Without doing pg_start_backup, and with rsync not performing a snapshot
 backup, my assumption is that until we do an rsync with the service
 shutdown, whatever we've got at the location we're copying to, is not
 self-consistent.


Above explanation should answer this.


 If we start up postgresql on it, won't it think it is recovering from a
 sudden crash? I think it may either appear to recover ok, or complain about
 various things, and not start up ok, with neither option providing us with
 much insight, as all that could tell us is that either some disk blocks are
 consistent, or some are not, which is our starting assumption anyway.


Starting up postgresql would probably result in more disk block changes
 that will result in more work next time we rsync.


This is normal behavior of rsync. It all depends on how volatile is your
system and volume of changes performed.


 How badly can we screw things up, given we intend to perform a final rsync
 with no postgresql services running? What should we try and avoid doing,
 and why?


 We might simply compare some hashes between the two systems, of some files
 that haven't had their last-modified dates changed since the last rsync.


All this will be taken care by Postgres with the help of WAL archive files
generated at the time when you performed rsync with postgres services up
and running.

Thanks
VB


Re: [GENERAL] Server move using rsync

2011-10-28 Thread Venkat Balaji

 Another option is to use rsync to perform a file system backup. This is
 done by first running rsync while the database server is running, then
 shutting down the database server just long enough to do a second rsync. The
 second rsync will be much quicker than the first, because it has relatively
 little data to transfer, and the end result will be consistent because the
 server was down. This method allows a file system backup to be performed
 with minimal downtime.

 Except that we plan on an initial rsync which we think might take a couple
 of days, then subsequent daily rsyncs for up to a week to keep it up to date
 till we stop the old database, rsync again, and start the new database.


We are performing backups to our production server exactly the same way. We
have been through some problems while restoring and bringing up the
database. If you are planning to take initial complete rsync with subsequent
incremental rsyncs, then you need to make sure that you have all the WAL
archives starting from the initial rsync on Day 1.

Also are you doing the following?

1. pg_start_backup() - rsync - pg_stop_backup() ?
2. Please let us know your WAL Archive backup strategy.

Is there any way during that week, that we can verify whether our partially
 completed database move process is going to result in a database that starts
 up ok?


In general, yes, database can start up normally. Without WAL Archives,
recovering to a particular time would not be possible.

Thanks
VB


Re: [GENERAL] Are pg_xlog/* fiels necessary for PITR?

2011-10-27 Thread Venkat Balaji
On Thu, Oct 27, 2011 at 7:57 PM, rihad ri...@mail.ru wrote:

 Hi, I'm backing up the entire server directory from time to time. pg_xlog/
 directory containing WAL files is pretty heavy (wal_level=archive). Can I
 exclude it from the regular tar archive?


The best would be to perform pg_switch_xlog() and take a backup excluding
pg_xlog.

To recover the last moment TXNs, you might need pg_xlog (depends on when you
would be recovering). pg_switch_xlog() will reduce the dependency on pg_xlog
files to a greater extent.


 #!/bin/sh

 renice 20 $$ 2/dev/null
 pgsql -U pgsql -q -c CHECKPOINT postgres # speed up pg_start_backup()


pg_start_backup() performs a checkpoint and ensures that all the data till
that particular checkpoint and TXN id will be backed up (or marked as needed
for data consistency while restoring and recovering).


 pgsql -U pgsql -q -c select pg_start_backup('sol') postgres
 tar -cjf - /db 2/dev/null | ssh -q -i ~pgsql/.ssh/id_rsa -p 2022 -c
 blowfish dbarchive@10.0.0.1 'cat  db.tbz'
 pgsql -U pgsql -q -c select pg_stop_backup() postgres
 sleep 60 #wait for new WAL backups to appear
 echo 'ssh -q dbarchive@10.0.0.1 ./post-backup.sh' | su -m pgsql


 I want to change tar invocation to be: tar -cjf --exclude 'db/pg_xlog/*'
 ...

 Will there be enough data in case of recovery? (May God forbid... )))


But, all the WAL Archives between backup start time and end time must be
backed up. They are needed at any cost for the database to be consistent and
the recovery to be smooth.

Recovering to any point-in-time purely depends on your backup strategy.

Thanks
VB


Re: [GENERAL] List Permissions

2011-10-25 Thread Venkat Balaji
My answers are in line in RED -

  How can I list a users permissions table by table?

   i.e.  User Joe
  has read/write on table1

 has read on table2
  no access on table 3


For a particular user you can use below function. You can write a SQL query
or script which takes table names from pg_tables one by one.

has_table_privilege(user, table, privilege)

Example :

I am checking if user postgres has select privilege on table1.

postgres=# select has_table_privilege('postgres','public.table1','select');

has_table_privilege
-
 t
(1 row)


For current user (user you logged in as) you can use the following function

has_table_privilege(table, privilege)

I am checking if the current_user has select privilege on table1

Example:

postgres=# select current_user;

current_user
--
postgres

(1 row)

postgres=# select has_table_privilege('public.table1','select');

has_table_privilege
-
 t

Below link has all the other functions regarding checking permissions

http://www.postgresql.org/docs/9.0/static/functions-info.html

Hope this helps !

Thanks
VB


Re: [GENERAL] : PostgreSQL Online Backup

2011-10-24 Thread Venkat Balaji
Sorry for not responding to this email for so long.

Alan,

We had mentioned the following line in recovery.conf file (we had given
pg_xlog location since we did not have WAL archives) -

restore_command = 'cp data dir/pg_xlog/%f %p'

We found where the problem was -

Here is what i did -

1. We had taken a full backup using pg_start_backup() and pg_stop_backup()
on Day 1
2. Rest of the days (from Day 2 - Day 15), we had incrementally backed-up (
this is also using pg_start_backup() and pg_stop_backup())
3. On Day-16th, when i started the recovery, PG was asking Day 1's WAL
archive file, which we did not have.

A fresh complete backup with change in our backup strategy resolved the
issue.

Thanks a lot for all your inputs and help on this !!

Regards,
VB

2011/10/4 Alan Hodgson ahodg...@simkin.ca

   rsync works fine. Why exactly can't the recovery find the backed up
 copy
   of 000105390076? Please post your archive_command settings,
   the contents of any script(s) called by that, and the recovery.conf
 file
   you're using that's having problems, as well as the complete process
 you
   followed to
   initiate recovery. I strongly suspect you're missing part of the
 process
   of actually saving the WAL files needed for recovery.

  The recovery is unable to find the WAL archive because, it was generated
 on
  26th September.
 
  Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the
 files.
 
  I do not have that WAL archive copy.
 
  The problem area -
 
  I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
  256K).
 

 I'm going to need the rest of what I asked for to offer any further
 suggestions
 - especially the full and exact steps you took to initiate recovery and the
 contents of recovery.conf. Also, please don't top-post.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Venkat Balaji
Another problem in recovery (probably because of rsync) -

As said earlier, we are taking a production backup everyday incrementally
using rsync.

But, Postgres some how misses to sync few files in between and keeps on
asking the back dated archive files (more than 1 week ago).

I restored October 2nd backup and PG is asking for September 26th archive
file with the last known time as 26th Sep, 2011.

2011-10-03 07:17:12 CDT [12705]: [1-1] LOG:  database system was
interrupted; last known up at 2011-09-26 09:01:36 CDT
2011-10-03 07:17:12 CDT [12705]: [2-1] LOG:  starting archive recovery
cp: cannot stat
`/usr/local/pgsql9.0.1/obtdata/data/pg_xlog/000105390076': No
such file or directory
2011-10-03 07:17:12 CDT [12705]: [3-1] LOG:  could not open file
pg_xlog/000105390076 (log file 1337, segment 118): No such
file or directory
2011-10-03 07:17:12 CDT [12705]: [4-1] LOG:  invalid checkpoint record
2011-10-03 07:17:12 CDT [12705]: [5-1] PANIC:  could not locate required
checkpoint record
2011-10-03 07:17:12 CDT [12705]: [6-1] HINT:  If you are not restoring from
a backup, try removing the file
/usr/local/pgsql9.0.1/obtdata/data/backup_label.
2011-10-03 07:17:12 CDT [12702]: [1-1] LOG:  startup process (PID 12705) was
terminated by signal 6: Aborted
2011-10-03 07:17:12 CDT [12702]: [2-1] LOG:  aborting startup due to startup
process failure


I always see pg_clog files and some base files not getting synced.

Below is what we are doing -

pg_start_backup()
rsync the data directory
pg_stop_backup()

The first time rsync is fine, but, the subsequent runs are generating
in-consistency.

We do the same every day to backup the data directory incrementally.

What i observed is PG records the TXN id when ever backup starts and stops +
backup label. The next day when PG records the start backup time and TXN id,
i think some of the TXN ids and pg_clog files generated between last stop
time and the next start time are missed.

Did anyone observe this behavior ?? Please help !

This is critical for us. I want to recommend not to use rsync (use cp or
scp instead) for production backup.

Thanks
VB

On Tue, Sep 27, 2011 at 2:36 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote:

 Venkat Balaji wrote:
  Our problem is -
 
  We had mistakenly executed  rsync on the running PostgreSQL data
 directory (production) and we did
  not run pg_start_backup().
 
  Will this harm production ? can this lead to corruption ?

 I assume that you used rsync to copy *from* the data directory.

 This cannot lead to data corruption.
 Only performance might suffer temporarily due to the additional I/O.

 The backup made with rsync will be unusable without pg_start_backup().

 Yours,
 Laurenz Albe



Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Venkat Balaji
The recovery is unable to find the WAL archive because, it was generated on
26th September.

Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files.

I do not have that WAL archive copy.

The problem area -

I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
256K).

Thanks
VB

2011/10/3 Alan Hodgson ahodg...@simkin.ca

 On October 3, 2011 05:33:35 AM Venkat Balaji wrote:
  Did anyone observe this behavior ?? Please help !
 
  This is critical for us. I want to recommend not to use rsync (use cp
 or
  scp instead) for production backup.
 

 rsync works fine. Why exactly can't the recovery find the backed up copy of
 000105390076? Please post your archive_command settings, the
 contents of any script(s) called by that, and the recovery.conf file you're
 using that's having problems, as well as the complete process you followed
 to
 initiate recovery. I strongly suspect you're missing part of the process of
 actually saving the WAL files needed for recovery.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] could not access file $libdir/pg_buffercache: No such file or directory

2011-09-30 Thread Venkat Balaji
One way could be -

If the restore you are performing is not very big. Then pg_dump can be taken
with --inserts and we can remove create function buffercache lines
from the output file and restore.

Thanks
VB

On Fri, Sep 30, 2011 at 10:59 AM, Venkat Balaji venkat.bal...@verse.inwrote:

 I had faced the same problem 2 days earlier and that was for
 pg_freespacemap contrib module.

 I did not know the way to ignore these functions and installed THE contrib
 modules and restored. It worked !

 I am also looking for a way to ignore these functions.

 Thanks
 VB


 On Fri, Sep 30, 2011 at 8:15 AM, Royce Ausburn royce...@inomial.comwrote:

 I'm in the process of testing out Postgres 9.0 for production use.  I've
 been using it for development on my mac, a build from EnterpriseDB.  We've
 just installed a 9.0.5 on an Ubuntu (Ubuntu 10.04.3 LTS) machine from a
 backport from lucid.  There's an existing 8.4.8 postgres install also on
 that machine, but as far as I'm aware the debs are set up so you can happily
 have two installs side by side.

 I've dumped a test DB from my laptop and attempted to restore it on to the
 ubuntu machine, but I see errors:

 royce@fishy:~$ createdb test
 royce@fishy:~$ pg_restore -d test --no-owner test_RTR.pgc
 pg_restore: [archiver (db)] Error while PROCESSING TOC:
 pg_restore: [archiver (db)] Error from TOC entry 65; 1255 2194102 FUNCTION
 pg_buffercache_pages() persona
 pg_restore: [archiver (db)] could not execute query: ERROR:  could not
 access file $libdir/pg_buffercache: No such file or directory
Command was: CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF
 record
LANGUAGE c
AS '$libdir/pg_buffercache', 'pg_buffercache_...


 After looking around a bit, my theory on these errors is that my mac has
 some contrib module installed that adds views to my test database, and those
 views have been included in the dump.  The contrib module is apparently not
 installed on the ubuntu machine and hence the restore can't create the
 views.  Is this theory correct?  If so:

 - Can these errors be safely ignored? (not ideal, because I'll miss other,
 real errors)
 - Is there some simple way I can prevent these views from being emitted as
 part of the dump?
 - Is there some simple way I can prevent these views from being restored
 from a dump that contains them?
 - Otherwise, how can I purge these contrib modules from my laptop?

 Otherwise, if my theory is incorrect, any hints as to what it might be?

 Cheers!

 --Royce

 Mac:

  version

 
  PostgreSQL 9.0.4 on x86_64-apple-darwin, compiled by GCC
 i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit
 (1 row)


 Ubuntu box:

   version

 -
  PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
 (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
 (1 row)


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general





Re: [GENERAL] could not access file $libdir/pg_buffercache: No such file or directory

2011-09-29 Thread Venkat Balaji
I had faced the same problem 2 days earlier and that was for
pg_freespacemap contrib module.

I did not know the way to ignore these functions and installed THE contrib
modules and restored. It worked !

I am also looking for a way to ignore these functions.

Thanks
VB

On Fri, Sep 30, 2011 at 8:15 AM, Royce Ausburn royce...@inomial.com wrote:

 I'm in the process of testing out Postgres 9.0 for production use.  I've
 been using it for development on my mac, a build from EnterpriseDB.  We've
 just installed a 9.0.5 on an Ubuntu (Ubuntu 10.04.3 LTS) machine from a
 backport from lucid.  There's an existing 8.4.8 postgres install also on
 that machine, but as far as I'm aware the debs are set up so you can happily
 have two installs side by side.

 I've dumped a test DB from my laptop and attempted to restore it on to the
 ubuntu machine, but I see errors:

 royce@fishy:~$ createdb test
 royce@fishy:~$ pg_restore -d test --no-owner test_RTR.pgc
 pg_restore: [archiver (db)] Error while PROCESSING TOC:
 pg_restore: [archiver (db)] Error from TOC entry 65; 1255 2194102 FUNCTION
 pg_buffercache_pages() persona
 pg_restore: [archiver (db)] could not execute query: ERROR:  could not
 access file $libdir/pg_buffercache: No such file or directory
Command was: CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF record
LANGUAGE c
AS '$libdir/pg_buffercache', 'pg_buffercache_...


 After looking around a bit, my theory on these errors is that my mac has
 some contrib module installed that adds views to my test database, and those
 views have been included in the dump.  The contrib module is apparently not
 installed on the ubuntu machine and hence the restore can't create the
 views.  Is this theory correct?  If so:

 - Can these errors be safely ignored? (not ideal, because I'll miss other,
 real errors)
 - Is there some simple way I can prevent these views from being emitted as
 part of the dump?
 - Is there some simple way I can prevent these views from being restored
 from a dump that contains them?
 - Otherwise, how can I purge these contrib modules from my laptop?

 Otherwise, if my theory is incorrect, any hints as to what it might be?

 Cheers!

 --Royce

 Mac:

  version

 
  PostgreSQL 9.0.4 on x86_64-apple-darwin, compiled by GCC
 i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit
 (1 row)


 Ubuntu box:

   version

 -
  PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
 (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
 (1 row)


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] : Looking for a PostgreSQL book

2011-09-28 Thread Venkat Balaji
Hello Everyone,

I have been working on PostgreSQL for quite a while (2 yrs) now.

I have got PostgreSQL 9.0 High Performance book and quite excited to go
through it.

Please let me know any source where i can get more books on PG, I am
especially looking for books on PG internals, architecture, Backup 
Recovery and HA.

Looking forward for the information.

Regards,
VB


Re: [GENERAL] : Looking for a PostgreSQL book

2011-09-28 Thread Venkat Balaji
Thanks Adam !

Regards,
VB

On Thu, Sep 29, 2011 at 12:03 AM, Adam Cornett adam.corn...@gmail.comwrote:

 The same publisher (Packt) has a book *PostgreSQL
 9 Administration Cookbook* by Simon Riggs and Hannu Krosing that is
 equally useful as Greg's *High Performance* book


 On Wed, Sep 28, 2011 at 1:14 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello Everyone,

 I have been working on PostgreSQL for quite a while (2 yrs) now.

 I have got PostgreSQL 9.0 High Performance book and quite excited to go
 through it.

 Please let me know any source where i can get more books on PG, I am
 especially looking for books on PG internals, architecture, Backup 
 Recovery and HA.

 Looking forward for the information.

 Regards,
 VB




 --
 Adam Cornett
 adam.corn...@gmail.com
 (678) 296-1150



Re: [GENERAL] PostgreSQL recovery when lost some file in data\global

2011-09-27 Thread Venkat Balaji
Hi Tuan Hoang Anh,

Are you able to bring up the cluster ??

Please let us know what problem you are facing.

Thanks
Venkat

On Tue, Sep 27, 2011 at 12:08 PM, tuanhoanganh hatua...@gmail.com wrote:

 I am running PostgreSQL 9.0.1 32bit on windows 2003. Last night my disk had
 some problem and i lost some file in data\global.
 Is there anyway to recovery postgresql.

 Thanks in advance. Sorry for my English.

 Tuan Hoang Anh



[GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji
Hello Everyone,

We have had situations where-in rsync was executed without executing
pg_start_backup() on the production data directory and on the next runs,
pg_start_backup() has been executed with rsync. This was to avoid high
IO load on production. We ended up getting unmatched files (especially in
pg_clog) and not sure about base directory.

Postgres is asking for WAL Archive files dated sometime around 15 days ago.
We are absolutely not sure whats going on.

Is this dangerous for production (like corruption) ? or just the backup will
be invalid ? Please help us know if we have to perform any precautionary
checks on the production cluster.

Apart from firing a checkpoint, does pg_start_backup() updates any
dictionary tables or views ? or it updates anything in pg_xlog

Looking forward for your help !

Thanks
VB


Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji
I tried restoring the backup, after taking the full backup.

Below is what i see in the archive destination.

Postgres was asking for 00010193006F and i tried to find the
same and below is what i find...

-rw--- 1 postgres postgres 3.3M Sep 26 02:06 00010193006F.gz
-rw--- 1 postgres postgres  219 Sep 26 02:53
00010193006F.00328508.backup.gz

Why is PG (9.0) putting an extension for the WAL Archive file as backup.gz
??

Please help !

Thanks
VB

On Mon, Sep 26, 2011 at 5:11 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello Everyone,

 We have had situations where-in rsync was executed without executing
 pg_start_backup() on the production data directory and on the next runs,
 pg_start_backup() has been executed with rsync. This was to avoid high
 IO load on production. We ended up getting unmatched files (especially in
 pg_clog) and not sure about base directory.

 Postgres is asking for WAL Archive files dated sometime around 15 days ago.
 We are absolutely not sure whats going on.

 Is this dangerous for production (like corruption) ? or just the backup
 will be invalid ? Please help us know if we have to perform any
 precautionary checks on the production cluster.

 Apart from firing a checkpoint, does pg_start_backup() updates any
 dictionary tables or views ? or it updates anything in pg_xlog

 Looking forward for your help !

 Thanks
 VB





Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji
Thanks for all your inputs !

Our problem is -

We had mistakenly executed  rsync on the running PostgreSQL data directory
(production) and we did not run pg_start_backup().

Will this harm production ? can this lead to corruption ?

Thanks -

On Mon, Sep 26, 2011 at 10:29 PM, Alan Hodgson ahodg...@simkin.ca wrote:

 On September 26, 2011 05:49:50 AM Venkat Balaji wrote:
  I tried restoring the backup, after taking the full backup.
 
  Below is what i see in the archive destination.
 
  Postgres was asking for 00010193006F and i tried to find
 the
  same and below is what i find...
 
  -rw--- 1 postgres postgres 3.3M Sep 26 02:06
  00010193006F.gz -rw--- 1 postgres postgres  219 Sep 26
  02:53
  00010193006F.00328508.backup.gz
 
  Why is PG (9.0) putting an extension for the WAL Archive file as
   backup.gz ??
 

 The archive files are created by your archive_command, as specified in
 postgresql.conf. My guess would be that your archive command runs the files
 through gzip as part of archiving (which is fine).

 However, the restore_command you specify in recovery.conf  must undo this
 compression. So instead of (for example) 'cp -f %f %p', it might
 instead
 need to look like 'zcat %f  %p'.

 Hope this helps.



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] : Checksum ERROR when restoring Online Backup

2011-09-23 Thread Venkat Balaji
Hello Everyone,

I am testing the Online Backups of our production databases ( this is part
of our disaster recovery plan ).

After restoring the Online Backup, we tried to bring up the cluster and
ended up with the following error -

2011-09-23 07:29:04 CDT [24092]: [1-1] FATAL:  incorrect checksum in control
file.

Does this message mean, the Online Backup is corrupted or invalid ?

Normally, we get recovery related error messages. This is the first time we
are facing a problem like this.

Please help as this is critical for us.

Thanks
Venkat


Re: [GENERAL] : Checksum ERROR when restoring Online Backup

2011-09-23 Thread Venkat Balaji
Apologies -

I did not mention complete details of PG and OS -

Postgres 9.0.1
Production Operating System version where Postgres is running is as follows
-
Linux *prod-server* 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux

The server version where I restored the production backup is as follows -
Linux *backup-server* 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 EDT
2010 i686 i686 i386 GNU/Linux

I read some where that, Postgres datafiles are not architecture
independent.

Please help !

Thanks
Venkat


On Fri, Sep 23, 2011 at 6:11 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello Everyone,

 I am testing the Online Backups of our production databases ( this is part
 of our disaster recovery plan ).

 After restoring the Online Backup, we tried to bring up the cluster and
 ended up with the following error -

 2011-09-23 07:29:04 CDT [24092]: [1-1] FATAL:  incorrect checksum in
 control file.

 Does this message mean, the Online Backup is corrupted or invalid ?

 Normally, we get recovery related error messages. This is the first time we
 are facing a problem like this.

 Please help as this is critical for us.

 Thanks
 Venkat



Re: [GENERAL] : Checksum ERROR when restoring Online Backup

2011-09-23 Thread Venkat Balaji
Thanks Richard !

I realized that, I was restoring on an 32 bit server.

Regards,
Venkat

On Fri, Sep 23, 2011 at 6:59 PM, Richard Huxton d...@archonet.com wrote:

 On 23/09/11 13:53, Venkat Balaji wrote:

 Linux *prod-server* 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT
 2010 x86_64 x86_64 x86_64 GNU/Linux

 The server version where I restored the production backup is as follows -
 Linux *backup-server* 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25
 EDT 2010 i686 i686 i386 GNU/Linux

 I read some where that, Postgres datafiles are not architecture
 independent.


 They (the WAL files) are not, and it looks like you're trying to restore a
 64-bit version onto a 32-bit server. That's not going to work. A
 pg_dump/restore works of course, and if you need replication then Slony can
 handle this.

 --
  Richard Huxton
  Archonet Ltd



[GENERAL] Calculate Vacuum Metrics

2011-09-19 Thread Venkat Balaji
Hello Everyone,

I am in the process of scheduling a VACUUM FULL for our production databases
where in downtime is extremely critical.

Can someone please help me calculate the amount of free space (or free
pages) in the Table and Index (even after regular autovacuum or vacuum
analyze is performed).

This will help me calculate the amount of space that will be claimed after
the vacuum full is performed.

I am using the below query (which i got from Google) to calculate the wasted
space and bloats in the Table and Index.

SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint,
relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
tbloat,
  CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
AS wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
END,1) AS ibloat,
  CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS
wastedibytes
FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta --
very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
nullhdr%ma END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac0 AND s2.schemaname = s.schemaname AND
s2.tablename = s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23
END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
rs.schemaname AND nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC ;

Can i continue to use it ? or is there any other better way to get the info
?

Please help !

Thanks
Venkat


Re: [GENERAL] warm standby - apply wal archives

2011-09-18 Thread Venkat Balaji
Syncing just WAL archive directory every minute should not be a problem at
all (running rsync every minute for a data directory is not recommended).

As said earlier, we had configured warm standby for a db of size 2 TB and
wal archive generation was in 100s.

We did not encounter any issues in running an rsync job with a frequency of
even less than a minute. We made sure that
rsync job is running on standby server (we were pulling the wal archives to
standby).

1. compress the wal archives
2. rsync on standby site
3. uncompress on standby site

Thanks
Venkat

On Sat, Sep 17, 2011 at 6:06 PM, MirrorX mirr...@gmail.com wrote:

 just another update since the system is up and running and one more
 question
 :p

 the secondary server is able to restore the wal archives practically
 immediately after they arrive. i have set a rsync cron job to send the new
 wals every 5 minutes. the procedure to transfer the files and to restore
 them takes about 30 seconds (the number of archives is about 20-30). i ve
 tried to set it to 2 minutes, and then the procedure takes about 20 seconds
 (both transfer and restoration) while i didnt notice any impact on the
 primary server (the procedure is initiated on the secondary server). what
 is
 your opinion about the time  interval that the cron job should run? i ve
 read many articles online indicating that rsync should not run every 1
 minute, but in my case isn't it different since it just syncs two folder
 containing only wals and not the whole disks? plus both folders on the
 servers are in different partitions.
 thx in advance for your insight

 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4813659.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Has Pg 9.1.0 been released today?

2011-09-13 Thread Venkat Balaji
Yes. I would be excited to know if there is a possibility of multi-master
replication system on Postgres.

We will be soon using 9.1 Streaming replication.

Thanks
Venkat

On Tue, Sep 13, 2011 at 1:31 AM, Aleksey Tsalolikhin 
atsaloli.t...@gmail.com wrote:

 Congratulations on the release of 9.1.0!

 Lots of great features, I for one can't wait to try out unlogged
 tables, that should help a lot in our environment.

 Now that you have streaming replication both async and sync, are you
 working on multi-master replication?  *excited*  Or what's the
 roadmap?

 Thanks again and keep up the great work!

 Aleksey

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] warm standby - apply wal archives

2011-09-06 Thread Venkat Balaji
Considering the size of WAL archives = 200GB

Compressing them using gzip (you can use this command in a shell script and
place it in archive_command as well) would possibly reduce the size to as
low as 10 - 20 GB.

Please let us know the results.

Thanks
Venkat

On Tue, Sep 6, 2011 at 1:03 PM, MirrorX mirr...@gmail.com wrote:

 The network bandwidth between the servers is definitely not an issue. What
 is
 bothering me is the big size of the wal archives, which goes up to 200GB
 per
 day and if the standby server will be able to replay all these files. The
 argument that; since the master can do it and also do various other tasks
 at
 the same time, and since the secondary is identical to the first, so he
 should be able to do that seems valid, so i will give it a try and let you
 know about the results. In the meantime if there are any other
 ideas/suggestions etc please let me know. thx to all

 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4773498.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] warm standby - apply wal archives

2011-09-05 Thread Venkat Balaji
In my experience, I had configured a warm standby for 2 TB Postgres Cluster
(PostgreSQL 8.4).

Note : I do not know your database size and WAL archive generation rate.

Important considerations i made were as follows -

1. WAL archives transfer from production to standy depends on the network
bandwidth (i think you said there is no issue there) and the size of the
WAL archives.
2. Transfer rate can be optimized by compressing the WAL files. Each WAL
file size would reduce to 2 - 3 MB from 16 MB (only in case of warm
standby. In streaming replication size would decrease to 7 or 6 MB),
which makes huge difference for the network bandwidth.

Compress the WAL archives at the production and transfer  uncompress the
WALs on standby.

I did this successfully.

Hope this helps !

Thanks
Venkat

On Tue, Sep 6, 2011 at 2:57 AM, MirrorX mirr...@gmail.com wrote:

 the nodes communicate through 4Gbps ethernet so i dont think there is an
 issue there. probably some kind of misconfiguration of DRBD has occured. i
 will check on that tommorow. thx a lot :)

 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4772126.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Postgresql-9.0.1 Recovery

2011-08-31 Thread Venkat Balaji
Thanks Craig !

Below  is what i did -

1. pg_start_backup()
2. rsync the data dir
3. pg_stop_backup()

I believe the backup is valid because, i was able to bring up the cluster
without any issues (ofcourse with data loss).

+ve signs-

I am able to bring up the cluster with the Online backup, but, only with the
loss  of data.

-ve signs and things to be strongly foreseen while backup testing -

   - pg_clog files were not synced. I suspect they were being written at the
   time of backup. I might have tried to sync the data dir when pg_clog files
   were half filled.
   - Though the WAL Archives are there, Postgres is not trying to recover
   beyond the timestamp at which pg_clog was missing.
   - Even if i replace the missing pg_clog files (which i did), Postgres is
   asking for the corresponding wal archive files

Yes. What i learnt is that we need to ensure that all the pg_clog files must
be fully copied as on the backup time. We cannot afford to miss any of them.

Thanks
Venkat
On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer ring...@ringerc.id.au wrote:

 On 30/08/2011 6:59 PM, Venkat Balaji wrote:

 Hello Everyone,

 I have a situation here -

 I am trying to restore the production online backup and recover the same.

  - I had initially rsynced (excluded pg_log) the data directory and the
 tarred and zipped the same


 Did you do that after pg_start_backup() or on a stopped database server?

 If you did it on a running database server without first running
 pg_start_backup(), your backup is invalid.

 Personally I like to take my base backups from an LVM snapshot of the
 datadir just to be extra safe. That isn't necessary, though, and a regular
 rsync or tar or whatever of a datadir after pg_start_backup() is fine.

 Remember to run pg_stop_backup() afterwards.


   - I got an error unable to read filename from pg_clog location
 (file size is around 160K)


 ... from PostgreSQL, when you tried to start it?

 What emitted that error message?


  What i understood is that, rsync some how missed out on syncing the
 files in pg_clog  so, i had manually coped the missing pg_clog file
 from production and tried recovery.


 That won't work. You need a consistent snapshot of all the files in the
 data dir. You cannot just mix and match copies taken at different times.

 For efficiency reasons PostgreSQL will recycle used clog files. You can't
 just copy a file over and hope that because it has the same name, it still
 contains the data you want.

 Your backup *failed* at the point where you got an incomplete copy of the
 data directory.


  Do i need to get that particular wal archive which is before online
 backup time ?


 No, you need to get the missing clog files. If you cannot do that, try
 using pg_resetxlog, but be aware that that may lose transactions and can
 potentially cause corruption of tables and indexes.


  By this experience what i understand is that Postgresql stores committed
 and uncommited transactions in pg_xlog / wal archive files and
 information (not the transaction data) about transaction commit status
 is stored in pg_clog. Am I correct ?


 That sounds right to me, but I don't know as much about how Pg stores
 things as I should.


  I am in the process of designing a disaster recovery planner for our
 productions systems.


 Congratulations!

 Be extremely glad this didn't happen in a real recovery scenario. This is a
 marvellous example of why you should always test your backups - you actually
 did, and found a problem that would've been a critical issue if the backup
 were actually needed.

 --
 Craig Ringer



Re: [GENERAL] heavy swapping, not sure why

2011-08-30 Thread Venkat Balaji
It is recommended to identify the processes using up high work_mem and try
to set work_mem to higher value at the session level.

I this case, all the connections using up maximum work_mem is the potential
threat. As said by Zoltan, work_mem is very high and shared_buffers as well.

Other considerations would be as following -

 - Allocated kernel memory settings (like shmmax and shmget etc..)
 - How much memory is used up by the system level processes (like root and
non-pg users)
 - It also depends on the database size and the amount of data being
accessed across CPUs and memory.
 - We need to ensure if unnecessary data is being read into the memory
   ( queries hitting non-vacuumed tables, slow performing queries,
unnecessary full table scans etc)

Regards,
Venkat

On Tue, Aug 30, 2011 at 3:30 PM, Boszormenyi Zoltan z...@cybertec.at wrote:

 Hi,

 2011-08-29 22:36 keltezéssel, Lonni J Friedman írta:
  ...  I read that
  (max_connections * work_mem) should never exceed physical RAM, and if
  that's accurate, then I suspect that's the root of my problem on
  systemA (below).

 work_mem is process-local memory so

 (max_connections * work_mem)   (physical RAM - shared_buffers)

 Some queries may allocate multiples of work_mem, too.

 Also, the kernel uses some memory for internal accounting, caching
 and you need to account for the process binary in memory.

However, I'd like confirmation before I start
  tweaking things, as one of these servers is in production, and I can't
  easily tweak settings to experiment (plus this problem takes a few
  weeks before swapping gets bad enough to impact performance).
 
  A few examples:
 
  0) system A: 56GB RAM, running postgresql-8.4.8 with the following
 parameters:
  maintenance_work_mem = 96MB
  effective_cache_size = 40GB
  work_mem = 256MB
  wal_buffers = 16MB
  shared_buffers = 13GB
  max_connections = 300

 RAM (56GB) - shared_buffers (13GB) = 43GB

 which is less than

 work_mem * max_connections = 300 * 0.25GB = 75GB

 The system would start swapping before 43GB/0.25GB = 172 clients.

  1) system B: 120GB RAM, running postgresql-9.0.4 with the following
 parameters:
  maintenance_work_mem = 1GB
  effective_cache_size = 88GB
  work_mem = 576MB
  wal_buffers = 4MB
  shared_buffers = 28GB
  max_connections = 200

 Similarly:

 120GB - 28GB = 92GB

 is less than

 work_mem * max_connections = 200 * 576MB = 112.5GB

 Also, if you run anything else on the machine then the system would start
 swapping much sooner than hitting max_connections number of clients.

 I would never set work_mem that high by default. 8 - 16MB is usually
 enough for the common case and you can set work_mem for special
 queries from the client and then reset it.

 Best regards,
 Zoltán Böszörményi

 --
 --
 Zoltán Böszörményi
 Cybertec Schönig  Schönig GmbH
 Gröhrmühlgasse 26
 A-2700 Wiener Neustadt, Austria
 Web: http://www.postgresql-support.de
 http://www.postgresql.at/


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] Postgresql-9.0.1 Recovery

2011-08-30 Thread Venkat Balaji
Hello Everyone,

I have a situation here -

I am trying to restore the production online backup and recover the same.

 - I had initially rsynced (excluded pg_log) the data directory and the
tarred and zipped the same
 - SCP'd the tar to a different server and untarred and unzipped the same
 - I got an error unable to read filename from pg_clog location (file
size is around 160K)

I have an backup as on Aug 24th and the pg_clog file which pg is unable to
read is as on Aug 5th.

I am aware that the size of the files in pg_clog is supposed to be 256K.

What i understood is that, rsync some how missed out on syncing the files in
pg_clog, so, i had manually coped the missing pg_clog file from production
and tried recovery.

To my surprise, PG recovery had asked for the corresponding pg_xlog (wal
archive) file as on Aug 5th.

Is there a way to recovery this ?
Do i need to get that particular wal archive which is before online backup
time ?

I do have all the other files till Aug24th.

By this experience what i understand is that Postgresql stores committed and
uncommited transactions in pg_xlog / wal archive files and information (not
the transaction data) about transaction commit status is stored in pg_clog.
Am I correct ?

I am in the process of designing a disaster recovery planner for our
productions systems.

Version  - PG-9.0.1
OS- RHEL 5

Please advice !

Regards,
Venkat