On Fri, Jun 17, 2011 at 3:29 AM, Robert Haas <robertmh...@gmail.com> wrote:
> Even if that were not an issue, I'm still more or less of the opinion
> that trying to solve the time synchronization problem is a rathole
> anyway.  To really solve this problem well, you're going to need the
> standby to send a message containing a timestamp, get a reply back
> from the master that contains that timestamp and a master timestamp,
> and then compute based on those two timestamps plus the reply
> timestamp the maximum and minimum possible lag between the two
> machines.  Then you're going to need to guess, based on several cycles
> of this activity, what the actual lag is, and adjust it over time (but
> not too quckly, unless of course a large manual step has occurred) as
> the clocks potentially drift apart from each other.  This is basically
> what ntpd does, except that it can be virtually guaranteed that our
> implementation will suck by comparison.  Time synchronization is
> neither easy nor our core competency, and I think trying to include it
> in this feature is going to result in a net loss of reliability.

Agreed. You've already added the note about time synchronization into
the document. That's enough, I think.

>>> errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),
>>
>> We should s/"a temporal"/"an Integer"?
>
> It seems strange to ask for an integer when what we want is an amount
> of time in seconds or minutes...

OK.

>> http://forge.mysql.com/worklog/task.php?id=344
>> According to the above page, one purpose of time-delayed replication is to
>> protect against user mistakes on master. But, when an user notices his wrong
>> operation on master, what should he do next? The WAL records of his wrong
>> operation might have already arrived at the standby, so neither "promote" nor
>> "restart" doesn't cancel that wrong operation. Instead, probably he should
>> shutdown the standby, investigate the timestamp of XID of the operation
>> he'd like to cancel, set recovery_target_time and restart the standby.
>> Something like this procedures should be documented? Or, we should
>> implement new "promote" mode which finishes a recovery as soon as
>> "promote" is requested (i.e., not replay all the available WAL records)?
>
> I like the idea of a new promote mode;

Are you going to implement that mode in this CF? or next one?

> and documenting the other
> approach you mention doesn't sound bad either.  Either one sounds like
> a job for a separate patch, though.
>
> The other option is to pause recovery and run pg_dump...

Yes, please.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to