Re: [ADMIN] [GENERAL] Streaming Replication Server Crash

2012-10-22 Thread Craig Ringer
On 10/23/2012 01:20 PM, Tom Lane wrote:
>
> This isn't the first time I've wondered exactly which signal was meant
> in a postmaster child-crash report.  Seems like it might be worth
> expending some code on a symbolic translation, instead of just printing
> the number.  That'd be easy enough (for common signal names) on Unix,
> but has anyone got a suggestion how we might do something useful on
> Windows?

Here's a typical Windows exception:


2012-10-04 14:29:08 CEST LOG:  server process (PID 1416) was terminated
by exception 0xC005

2012-10-04 14:29:08 CEST HINT:  See C include file "ntstatus.h" for a
description of the hexadecimal value.


These codes can be translated with FormatMessage:

 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms679351(v=vs.85).aspx

  http://support.microsoft.com/kb/259693

FormatMessage may not be safe to perform in the context of a munged heap
or some other failure conditions, so you probably don't want to do it
from a crash handler. It is safe for the postmaster to do it based on
the exception code it gets from the dying backend, though.

I'd say the best option is for the postmaster to print the
FormatMessage(
FORMAT_MESSAGE_ALLOCATE_BUFFER|FORMAT_MESSAGE_FROM_SYSTEM|FORMAT_MESSAGE_FROM_HMODULE,
...) output when it sees the exception code from the dying backend.

RtlNtStatusToDosError may also be of interest:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680600(v=vs.85).aspx

... but it's in Winternl.h so it's not guaranteed to exist / be
compatible between versions and can only be accessed via runtime dynamic
linking. Not ideal.

--
Craig Ringer


Re: [ADMIN] [GENERAL] Streaming Replication Server Crash

2012-10-22 Thread Tom Lane
Craig Ringer  writes:
> On 10/22/2012 08:52 PM, Tom Lane wrote:
>> But having said that, are we sure 10 is SIGUSR1 on the OP's platform?
>> AFAIK, that signal number is not at all compatible across different
>> flavors of Unix.  (I see SIGUSR1 is 30 on OS X for instance.)

> Gah. I incorrectly though that POSIX specified signal *numbers*, not
> just names. That does not appear to actually be the case. Thanks.

This isn't the first time I've wondered exactly which signal was meant
in a postmaster child-crash report.  Seems like it might be worth
expending some code on a symbolic translation, instead of just printing
the number.  That'd be easy enough (for common signal names) on Unix,
but has anyone got a suggestion how we might do something useful on
Windows?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN] [GENERAL] Streaming Replication Server Crash

2012-10-22 Thread Craig Ringer
On 10/22/2012 08:52 PM, Tom Lane wrote:
> Craig Ringer  writes:
>> On 10/19/2012 04:40 PM, raghu ram wrote:
>>> 2012-10-19 12:26:46 IST [1338]: [18-1] user=,db= LOG:  server process
>>> (PID 15565) was terminated by signal 10
> 
>> That's odd. SIGUSR1 (signal 10) shouldn't terminate PostgreSQL.
> 
>> Was the server intentionally sent SIGUSR1 by an admin? Do you know what
>> triggered the signal?
> 
> SIGUSR1 is used for all sorts of internal cross-process signaling
> purposes.  There's no need to hypothesize any external force sending
> it; if somebody had broken a PG process's signal handling setup for
> SIGUSR1, a crash of this sort could be expected in short order.
> 
> But having said that, are we sure 10 is SIGUSR1 on the OP's platform?
> AFAIK, that signal number is not at all compatible across different
> flavors of Unix.  (I see SIGUSR1 is 30 on OS X for instance.)

Gah. I incorrectly though that POSIX specified signal *numbers*, not
just names. That does not appear to actually be the case. Thanks.

A bit of searching suggests that on Solaris/SunOS, signal 10 is SIGBUS:

http://www.s-gms.ms.edus.si/cgi-bin/man-cgi?signal+3HEAD
http://docs.oracle.com/cd/E23824_01/html/821-1464/signal-3head.html

... which tends to suggest an entirely different interpretation than
"someone broke a signal hander":

https://blogs.oracle.com/peteh/entry/sigbus_versus_sigsegv_according_to

such as:

- Bad mmap()ed read
- alignment error
- hardware fault

so it's not immensely different to a segfault in that it can be caused
by errors in hardware, OS, or applications.

Raghu, did PostgreSQL dump a core file? If it didn't, you might want to
enable core dumps in future. If it did dump a core, attaching a debugger
to the core file might tell you where it crashed, possibly offering some
more information to diagnose the issue. I'm not familiar enough with
Solaris to offer detailed advice on that, especially as you haven't
mentioned your Solaris version, how you installed Pg, etc. This may be
of some use:


http://stackoverflow.com/questions/6403803/how-to-get-backtrace-function-line-number-on-solaris

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Scott Marlowe
On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico  wrote:
> After reading the comments last week about SSDs, I did some testing of
> the ones we have at work - each of my test-boxes (three with SSDs, one
> with HDD) subjected to multiple stand-alone plug-pull tests, using
> pgbench to provide load. So far, there've been no instances of
> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
> of errors.

Try starting pgbench, and then halfway through the timeout for a
checkpoint timeout issue a checkpoint and WHILE the checkpoint is
still running THEN pull the plug.

Then after bringing the server up (assuming pg starts up) see if
pg_dump generates any errors.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.1 to 9.2 requires a dump/reload?

2012-10-22 Thread Alan Hodgson
On Monday, October 22, 2012 05:55:07 PM Nikolas Everett wrote:
> I was just looking at
> http://www.postgresql.org/docs/devel/static/release-9-2.html and it
> mentioned that a dump/reload cycle was required to upgrade from a previous
> release.  I just got done telling some of my coworkers that PG had been
> bitten by this enough times that they were done with it.  Am I wrong?  Is
> this normal?
> 
> I see that pg_upgrade is an option.  Having never used how long should I
> expect pg_upgrade to take?  Obviously we'll measure it in our environment,
> but it'd be nice to have a ballpark figure.

pg_upgrade using hard links should only take a minute or 2. You'll also need 
to shuffle around packages and services and config files. The slowest part for 
any 
decent sized database will be doing an analyze after bringing it up under 9.2, 
though. So however long that takes for your db, plus maybe 10-15 minutes or 
so, if you've practiced.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.1 to 9.2 requires a dump/reload?

2012-10-22 Thread Lonni J Friedman
pg_upgrade has worked fine for several releases.  I believe that the
only time when pg_upgrade isn't a viable option is for some types of
GIST indices.

On Mon, Oct 22, 2012 at 2:55 PM, Nikolas Everett  wrote:
> I was just looking at
> http://www.postgresql.org/docs/devel/static/release-9-2.html and it
> mentioned that a dump/reload cycle was required to upgrade from a previous
> release.  I just got done telling some of my coworkers that PG had been
> bitten by this enough times that they were done with it.  Am I wrong?  Is
> this normal?
>
> I see that pg_upgrade is an option.  Having never used how long should I
> expect pg_upgrade to take?  Obviously we'll measure it in our environment,
> but it'd be nice to have a ballpark figure.
>
> Nik



-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] 9.1 to 9.2 requires a dump/reload?

2012-10-22 Thread Nikolas Everett
I was just looking at
http://www.postgresql.org/docs/devel/static/release-9-2.html and it
mentioned that a dump/reload cycle was required to upgrade from a previous
release.  I just got done telling some of my coworkers that PG had been
bitten by this enough times that they were done with it.  Am I wrong?  Is
this normal?

I see that pg_upgrade is an option.  Having never used how long should I
expect pg_upgrade to take?  Obviously we'll measure it in our environment,
but it'd be nice to have a ballpark figure.

Nik


Re: [GENERAL] Somewhat automated method of cleaning table of corrupt records for pg_dump

2012-10-22 Thread Martijn van Oosterhout
On Mon, Oct 22, 2012 at 11:54:47AM +0200, Heiko Wundram wrote:
> If there's any other possibility of "out of the box" recovery -
> except writing myself a small script to walk all rows - I'd still be
> grateful for a hint.

Something that has worked for me in the past is:

$ SELECT ctid FROM table WHERE length(field) < 0;

This gives you a list of ctids (if it works) which you can delete. You
can also look for very large lengths. This works because length()
doesn't actually unpack the string, it just pulls the length.

It doesn't always work, it depends on the kind of corruption. You also
need to start at the leftmost text field and work forwards, because it
blows up while unpacking the tuples.

Otherwise you're back to doing things like:

$ SELECT sum(length(field || '')) FROM table OFFSET x LIMIT y;

And doing a bisect type algorithm to narrow down where it is. The
sum(length()) is so you throw away the output after checking field can
be extracted properly.  Once you get close you start printing the ctids
and take a stab at the ctid of the broken row and delete it.

If your table is large the OFFSET/LIMIT get slow. Unfortunatly Postgres
doesn't understand clauses like 'ctid > (page,tuple)' to start scanning
at a particular spot in the table.

It's not automated, though it might not be hard to do.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Jeff Janes
On Mon, Oct 22, 2012 at 12:31 PM, Chris Angelico  wrote:
> On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes  wrote:
>> What did you do to look for corruption?  That PosgreSQL succeeds at
>> going through crash-recovery and then starting up is not a good
>> indicator that there is no corruption.
>
> I fired up Postgres and looked at the logs for any signs of failure.
>
>> Did you do something like compute the aggregates on pgbench_history
>> and compare those aggregates to the balances in the other 3 tables?
>
> No, didn't do that. My next check will be done over the network
> (similar to diskchecker), with a script that fires off requests, waits
> for them to be confirmed committed, and then records a local copy, and
> will check that local copy once the server's back up again. That'll
> tell me if transactions are being lost.

If you like Perl, the count.pl from this message might be a useful
starting point:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

It was designed to check consistency after postmaster crashes, not OS
crashes, so the checker runs on the same host as postgres does.
Obviously for pull-the-plug test, you need run it on a different host;
so all the
DBI->connect()
calls need to be changed to do that.

> I'm kinda feeling my way in the dark here. Will check out the
> aggregates on pgbench_history when I get to work today; thanks for the
> tip!

Here's an example with pgbench_accounts, the other 2 should look analogous.

select aid, abalance, count(*) from (select aid,abalance from
pgbench_accounts union all select aid, sum(delta) from pgbench_history
group by aid) as foo group by aid, abalance having abalance!=0 and
count(*)!=2;

This should return zero rows.  Any other result indicates corruption.

pgbench truncates pgbench_history, but does not reset the balances to
zero on the other tables.  So if you want to run the test repeatedly,
you have to do pgbench -i between runs, or manually reset the balance
columns.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Chris Angelico
On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes  wrote:
> What did you do to look for corruption?  That PosgreSQL succeeds at
> going through crash-recovery and then starting up is not a good
> indicator that there is no corruption.

I fired up Postgres and looked at the logs for any signs of failure.

> Did you do something like compute the aggregates on pgbench_history
> and compare those aggregates to the balances in the other 3 tables?

No, didn't do that. My next check will be done over the network
(similar to diskchecker), with a script that fires off requests, waits
for them to be confirmed committed, and then records a local copy, and
will check that local copy once the server's back up again. That'll
tell me if transactions are being lost.

I'm kinda feeling my way in the dark here. Will check out the
aggregates on pgbench_history when I get to work today; thanks for the
tip!

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Jeff Janes
On Mon, Oct 22, 2012 at 6:17 AM, Chris Angelico  wrote:
> After reading the comments last week about SSDs, I did some testing of
> the ones we have at work - each of my test-boxes (three with SSDs, one
> with HDD) subjected to multiple stand-alone plug-pull tests, using
> pgbench to provide load. So far, there've been no instances of
> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
> of errors.

What did you do to look for corruption?  That PosgreSQL succeeds at
going through crash-recovery and then starting up is not a good
indicator that there is no corruption.

Did you do something like compute the aggregates on pgbench_history
and compare those aggregates to the balances in the other 3 tables?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Revert TRUNCATE CASCADE?

2012-10-22 Thread Merlin Moncure
On Mon, Oct 22, 2012 at 7:52 AM, Albe Laurenz  wrote:
> Hannes Erven wrote:
>> today I ran into an issue I believed to be a FAQ, but fortunately it
>> doesn't seem so as I could find any resources related to this... :-/
>>
>> A misguided click in PGADMIN executed a "TRUNCATE CASCADE" on a rather
>> central table of my schema, which resulted in most important tables
>> being emptied. Postgresql (9.0) was cleanly shut down immediately.
>>
>>
>> Is there any chance to get the data back?
>
> A dirty shutdown (-m immediate) would have been better.
>
> Basically it is work for an expert to try and salvage data like this.
>
> If (auto-)VACUUM has not run yet, maybe pg_resetxlog can do something
> for you.  But keep a copy of the original cluster before you start
> messing around.

nah -- that would only apply to records that were deleted (as in, with
SQL delete).  TRUNCATE has different mechanics -- it roughly
translates to: make new table like the old one, do file level delete
on the old one, and swap the new one in.   That's why it's so much
faster -- only the operation itself has to be logged, not what it's
doing.

The only recourse is to restore from backup assuming you are not
interested on pursuing 'undelete' strategies in the file system which
are unlikely to work anyways.

A filesystem level backup + stream of archived xlogs would have been
ideal -- then you could have brought the system back to precisely life
as it was precisely before the accident.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] streaming replication and data file consistency

2012-10-22 Thread Matt Savona
Hi all,

I am currently running Postgresql 9.2.1 with streaming replication: one
primary, one standby.  Once an hour I have a job which compares
pg_current_xlog_location on the primary against
pg_last_xlog_replay_location on the standby to ensure the standby is not
lagging too far behind the primary. So far everything is working great.

I noticed, however, that despite the fact that the cluster is consistently
in sync the md5sums and modified timestamps on many of my data files
differ. For example:

PRIMARY

# stat pgsql/data/base/16385/17600
  File: `pgsql/data/base/16385/17600'
  Size: 3112960 Blocks: 6080   IO Block: 4096   regular file
Device: fd02h/64770dInode: 39167976Links: 1
Access: (0600/-rw---)  Uid: (   26/postgres)   Gid: (   26/postgres)
Access: 2012-10-22 10:05:29.314607927 -0400
Modify: 2012-10-22 09:48:03.770209170 -0400
Change: 2012-10-22 09:48:03.770209170 -0400

# md5sum pgsql/data/base/16385/17600
5fb7909ea14ab7aa9636b31df5679bd4  pgsql/data/base/16385/17600

STANDBY

# stat pgsql/data/base/16385/17600
  File: `pgsql/data/base/16385/17600'
  Size: 3112960 Blocks: 6080   IO Block: 4096   regular file
Device: fd02h/64770dInode: 134229639   Links: 1
Access: (0600/-rw---)  Uid: (   26/postgres)   Gid: (   26/postgres)
Access: 2012-10-22 10:05:25.361235742 -0400
Modify: 2012-10-22 09:50:29.674567827 -0400
Change: 2012-10-22 09:50:29.674567827 -0400

# md5sum pgsql/data/base/16385/17600
9deeb7b446c12fbb5745d4d282113d3c  pgsql/data/base/16385/17600

The reason I am curious about this is because when both systems are healthy
and I wish to swap primaries, I will bring the primary and the standby down
and do a full rsync of the data/ directory from old primary to new primary.
However, because the data files are different, the rsync run takes a very
long time.

My questions are:
  1) While the xlog location between primary and standby remains
consistent, are the data files, internally, structured differently between
primary and standby?
  2) Is this expected, and if so, what causes them to diverge?

Thanks in advance for helping me understand this behavior!

- Matt


Re: [GENERAL] Streaming replication failed to start scenarios

2012-10-22 Thread chinnaobi
Well..

There should be a standby server ready to server as a primary if the current
primary goes down right ??

But the dead primary is yet in the mode of configuring stage or probably
failed to be a standby server. 

In my high availability cluster there are only two servers.

--Reddy.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Streaming-replication-failed-to-start-scenarios-tp5728519p5729310.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Login Users Details

2012-10-22 Thread Chris Angelico
On Mon, Oct 22, 2012 at 7:47 PM, Vishalakshi Navaneethakrishnan
 wrote:
> Hi all,
>
> I need to know who are all access database from different remote host.
>
> Example :
>
> User1@host1 logged / access db  dbuser@dbname in Dbserver
>
> How can i get this information?

As suggested, you can configure logging to record this. For
up-to-the-moment "who's currently logged in" information, look at the
pg_stat_activity table.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Streaming replication failed to start scenarios

2012-10-22 Thread Albe Laurenz
chinnaobi wrote:
> I have tested using cygwin rsync in windows 2008 R2, just after
restart the
> server.
> 
> for 10 GB it took nearly 5 minutes to sync,
> for 50 GB it took nearly 30 minutes,  -- too long Though there were no
big
> changes.
> 
> My requirement is something less than 5 minutes. I am doing high
> availability. In my setup there is WAL archiving enabled as well.
> 
> Is there any way I can speedup the configuring dead primary server to
> standby ??

Faster network, faster disks :^(

Why is it time critical to build a new standby?

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Streaming replication failed to start scenarios

2012-10-22 Thread chinnaobi
Hi Laurenz Albe,

I have tested using cygwin rsync in windows 2008 R2, just after restart the
server.

for 10 GB it took nearly 5 minutes to sync,
for 50 GB it took nearly 30 minutes,  -- too long Though there were no big
changes.

My requirement is something less than 5 minutes. I am doing high
availability. In my setup there is WAL archiving enabled as well.

Is there any way I can speedup the configuring dead primary server to
standby ??


Regards,
Reddy



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Streaming-replication-failed-to-start-scenarios-tp5728519p5729302.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Chris Angelico
After reading the comments last week about SSDs, I did some testing of
the ones we have at work - each of my test-boxes (three with SSDs, one
with HDD) subjected to multiple stand-alone plug-pull tests, using
pgbench to provide load. So far, there've been no instances of
PostgreSQL data corruption, but diskchecker.pl reported huge numbers
of errors.

What exactly does this mean? Is Postgres doing something that
diskchecker isn't, and is thus safe? Could data corruption occur but
I've just never pulled the power out at the precise microsecond when
it would cause problems? Or is it that we would lose entire
transactions, but never experience corruption that the postmaster
can't repair?

Interestingly, disabling write-caching with 'hdparm -W 0 /dev/sda' (as
per the llivejournal blog[1]) reduced the SSD's error rates without
eliminating failures entirely, while on the HDD, there were no
problems at all with write caching off.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Revert TRUNCATE CASCADE?

2012-10-22 Thread Albe Laurenz
Hannes Erven wrote:
> today I ran into an issue I believed to be a FAQ, but fortunately it
> doesn't seem so as I could find any resources related to this... :-/
> 
> A misguided click in PGADMIN executed a "TRUNCATE CASCADE" on a rather
> central table of my schema, which resulted in most important tables
> being emptied. Postgresql (9.0) was cleanly shut down immediately.
> 
> 
> Is there any chance to get the data back?

A dirty shutdown (-m immediate) would have been better.

Basically it is work for an expert to try and salvage data like this.

If (auto-)VACUUM has not run yet, maybe pg_resetxlog can do something
for you.  But keep a copy of the original cluster before you start
messing around.

> There is a "pg_dumpall" backup from yesterday, and also pg_xlogs from
> well before the dumpall-file until the TRUNCATE command.
Unfortunately,
> there is no file system backup from the xlog timeframe and as far as I
> understood the documentation, a DUMP is no valid base for PITR. Time
to
> rework backup practices I guess...

I agree.

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN] [GENERAL] Streaming Replication Server Crash

2012-10-22 Thread Tom Lane
Craig Ringer  writes:
> On 10/19/2012 04:40 PM, raghu ram wrote:
>> 2012-10-19 12:26:46 IST [1338]: [18-1] user=,db= LOG:  server process
>> (PID 15565) was terminated by signal 10

> That's odd. SIGUSR1 (signal 10) shouldn't terminate PostgreSQL.

> Was the server intentionally sent SIGUSR1 by an admin? Do you know what
> triggered the signal?

SIGUSR1 is used for all sorts of internal cross-process signaling
purposes.  There's no need to hypothesize any external force sending
it; if somebody had broken a PG process's signal handling setup for
SIGUSR1, a crash of this sort could be expected in short order.

But having said that, are we sure 10 is SIGUSR1 on the OP's platform?
AFAIK, that signal number is not at all compatible across different
flavors of Unix.  (I see SIGUSR1 is 30 on OS X for instance.)

> Are you running any procedural languages other than PL/PgSQL, or any
> custom C extensions? Anything that might have unwittingly cleared the
> signal handler for SIGUSR1?

libperl has a bad habit of thinking it can mess with the process's
signal setup ...

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Revert TRUNCATE CASCADE?

2012-10-22 Thread Hannes Erven

Hi all,


today I ran into an issue I believed to be a FAQ, but fortunately it 
doesn't seem so as I could find any resources related to this... :-/


A misguided click in PGADMIN executed a "TRUNCATE CASCADE" on a rather 
central table of my schema, which resulted in most important tables 
being emptied. Postgresql (9.0) was cleanly shut down immediately.



Is there any chance to get the data back?


There is a "pg_dumpall" backup from yesterday, and also pg_xlogs from 
well before the dumpall-file until the TRUNCATE command. Unfortunately, 
there is no file system backup from the xlog timeframe and as far as I 
understood the documentation, a DUMP is no valid base for PITR. Time to 
rework backup practices I guess...



Thank you very much for any advice,
best regards

-hannes


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Somewhat automated method of cleaning table of corrupt records for pg_dump

2012-10-22 Thread Heiko Wundram

Am 22.10.2012 09:05, schrieb Craig Ringer:

Working strictly with a *copy*, does REINDEXing then CLUSTERing the
tables help? VACCUM FULL on 8.3 won't rebuild indexes, so if index
damage is the culprit a reindex may help. Then, if CLUSTER is able to
rewrite the tables in index order you might be able to recover.


REINDEXing all indexes works, CLUSTERing all tables on the primary key 
(index) after rebuilding them also works, but the dump still displays 
the same error as before (i.e., out of memory while dumping a specific 
table). I guess that the broken record(s) in that table are either not 
moved, or their data portion isn't adapted to the respective index. 
Which means that besides walking all table rows (and deleting those by 
OID that contain error[s]), there's not much that I can do, I guess.


If there's any other possibility of "out of the box" recovery - except 
writing myself a small script to walk all rows - I'd still be grateful 
for a hint.


Thanks for the hint at CLUSTER and REINDEX, anyway, which I've never 
properly used before - this gave me an incentive to dig in the 
documentation.


--
--- Heiko.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] oracle_fdw

2012-10-22 Thread Albe Laurenz
Rob wrote:
> Some more info
> Oracle Server:Oracle 11g R2 (11.2.0.2.0)
> Client: 11.2
> Was installed using Oracle Universal Installer

Ok.

> I don't really want to post the full environment of the postmaster but
> basically I could see no entry in there for ORACLE_HOME or TNS_ADMIN,
should
> I?

Yes, you should see entries for those variables there.

If this is a "regular" client (installed with Universal Installer"),
then a missing ORACLE_HOME environment variable will trigger exactly
the error message you observe.

How do you start the PostgreSQL server?  Perhaps .bash_profile
is not read by the startup script's shell.  Try .bashrc or
try to define and export it in the startup script itself.

> LD_LIBRARY_PATH=mypostgreshomedirectory/lib
> 
> are there any others in particular of interest?

Everything that starts with NLS or ORA, for example.

> Here is my fdw, server and foreign table specs. I have
'myinstancename'
> defined in tnsnames.ora which is in $ORACLE_HOME/NETWORK/ADMIN

[...]

> CREATE FOREIGN TABLE public.wild_lek
>("WL_ID" integer ,
> "WL_ALIAS" character varying(50) ,
> "WL_AHM_FL" character varying(1) ,
> "WL_INACTIVE_FL" character varying(1) ,
> "WL_SATELLITE_FL" character varying(20) ,
> "WL_LESPPSG_FL" character varying(1) )
>SERVER myinstancename
>OPTIONS (table 'MYUSER.MYTABLE');
> ALTER FOREIGN TABLE 'MYUSER.MYTABLE'  OWNER TO postgres;

You mean ALTER FOREIGN TABLE "public"."wild_lek", right?

Are there any other typos in what you sent?

You don't get to that point yet, but there's a mistake in the table
definition.  It should be "OPTIONS (schema 'MYUSER', table 'MYTABLE')".

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Login Users Details

2012-10-22 Thread Raghavendra
On Mon, Oct 22, 2012 at 2:17 PM, Vishalakshi Navaneethakrishnan <
nvishalak...@sirahu.com> wrote:

> Hi all,
>
> I need to know who are all access database from different remote host.
>
> Example :
>
> User1@host1 logged / access db  dbuser@dbname in Dbserver
>
> How can i get this information?
>
> Thanks in advance..
>
>
Try log_line_prefix parameter, which has many option of writing information
into database server logs(pg_logs). You can get IP address of the user who
is accessing db, also the application name. Try %r option.

http://www.postgresql.org/docs/9.2/static/runtime-config-logging.html

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/


[GENERAL] Postgres Login Users Details

2012-10-22 Thread Vishalakshi Navaneethakrishnan
Hi all,

I need to know who are all access database from different remote host.

Example :

User1@host1 logged / access db  dbuser@dbname in Dbserver

How can i get this information?

Thanks in advance..

-- 
Best Regards,
Vishalakshi.N


Re: [GENERAL] Somewhat automated method of cleaning table of corrupt records for pg_dump

2012-10-22 Thread Craig Ringer
On 10/19/2012 10:31 PM, Heiko Wundram wrote:
> Hey!
> 
> I'm currently in the situation that due to (probably) broken memory in a
> server, I have a corrupted PostgreSQL database. Getting at the data
> that's in the DB is not time-critical (because backups have restored the
> largest part of it), but I'd still like to restore what can be restored
> from the existing database to fill in the remaining data. VACUUM FULL
> runs successfully (i.e., I've fixed the blocks with broken block
> headers, removed rows that have invalid OIDs as recorded by the VACUUM,
> etc.), but dumping the DB from the rescue system (which is PostgreSQL
> 8.3.21) to transfer it to another still fails with "invalid memory alloc
> request size 18446744073709551613", i.e., most probably one of the TEXT
> colums in the respective tables contains invalid sizings.

Working strictly with a *copy*, does REINDEXing then CLUSTERing the
tables help? VACCUM FULL on 8.3 won't rebuild indexes, so if index
damage is the culprit a reindex may help. Then, if CLUSTER is able to
rewrite the tables in index order you might be able to recover.

--
Craig Ringer



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Streaming Replication Server Crash

2012-10-22 Thread Craig Ringer
On 10/19/2012 04:40 PM, raghu ram wrote:
> Hi All,
> 
> We have configured Streaming Replication b/w Primary and Standby server
> and Pgpool-II load balancing module diverting
> SELECT statements to  Standby server. As per our observations, Standby
> server crashed during peak hours on today and error message as follows

> 2012-10-19 12:26:46 IST [1338]: [18-1] user=,db= LOG:  server process
> (PID 15565) was terminated by signal 10
> 
> 2012-10-19 12:26:46 IST [1338]: [19-1] user=,db= LOG:  terminating any
> other active server processes

That's odd. SIGUSR1 (signal 10) shouldn't terminate PostgreSQL.

Was the server intentionally sent SIGUSR1 by an admin? Do you know what
triggered the signal?

Are you running any procedural languages other than PL/PgSQL, or any
custom C extensions? Anything that might have unwittingly cleared the
signal handler for SIGUSR1?

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general