On 10/18/2012 07:33 PM, Josh Berkus wrote:
Simon,


It's hard to work out how to reply to this because its just so off
base. I don't agree with the restrictions you think you see at all,
saying it politely rather than giving a one word answer.
You have inside knowledge of Hannu's design.
Actually Simon has currently no more knowledge of this specific
design than you do - I posted this on this list as soon as I had figured
it out as a possible solution of a specific problem of supporting
 full pgQ/Londiste functionality in WAL based logical replication
with minimal overhead.

(well, actually I let it settle a few weeks, but i did not discuss
this off-list before ).

Simon may have better grasp of it thanks to having done work
on the BDR/Logical Replication design  and thus having better or
at least more recent understanding of issues involved in Logical
Replication.

When mapping londiste/Slony message capture to Logical WAL
the WAL already _is_ the event queue for replication.
NOT LOGGED tables make it also usable for non-replication
things using same mechanisms. (the equivalent in trigger-based
system would be a  log trigger which captures insert event and then
cancels an insert).

I am merely going from his
description *on this list*, because that's all I have to go in.

He requested comments, so here I am, commenting.  I'm *hoping* that it's
merely the description which is poor and not the conception of the
feature.  *As Hannu described the feature* it sounds useless and
obscure, and miles away from powering any kind of general queueing
mechanism.
If we describe a queue as something you put stuff in at one end and
get it out in same or some other specific order at the other end, then
WAL _is_ a queue when you use it for replication  (if you just write to it,
then it is "Log", if you write and read, it is "Queue")

That is, the WAL already is a form of persistent and ordered (that is how WAL works)
stream of messages ("WAL records") that are generated on the "master"
and replayed on one or more consumers (called "slaves" in case of simple
replication)

All it takes to make this scenario work is keeping track of LSN or simply
log position on the slave side.

What you seem to be wanting is support for a cooperative consumers,
that is multiple consumers on the same queue working together and
sharing the work to process the incoming event .

This can be easily achieved using a single ordered event stream and
extra bookkeeping structures on the consumer side (look at cooperative
consumer samples in skytools).

What I suggested was optimisation for the case where you know that you
will never need the data on the master side and are only interested in it
on the slave side.

By writing rows/events/messages only to log (or steam or queue), you
avoid the need to later clean up it on the master by either DELETE or
TRUNCATE or rotating tables.

For both physical and logical streaming the WAL _is_ the queue of events
that were recorded on master and need to be replied on the slave.

Thanks to introducing logical replication, it now makes sense to have
actions recorded _only_ in this queue and this is what the whole RC was about.

I recommend that you introduce yourself a bit to skytools/pgQ to get a
better feel of the things I am talking about. Londiste is just one application
built on a general event logging, transport and transform/replay (that is
what i'd call queueing :) ) system pgQ.

pgQ does have its roots in Slony an(and earlier) replication systems, but it
is by no means _only_ a replication system.

The LOG ONLY tables are _not_ needed for pure replication (like Slony) but
they make replication + queueing type solutions like skytools/pgQ much more
efficient as they do away wuth the need to maintain the queued data on the
master side where it will never be needed ( just to reapeat this once more
)

Or anything we discussed at the clustering meetings.

And, again, if you didn't want comments, you shouldn't have posted an RFC.
I did want comments and as far as I know I do not see you as hostile :)

I do understand that what you mean by QUEUE (and specially as a
MESSAGE QUEUE) is different from what I described.
You seem to want specifically an implementation of cooperative
consumers for a generic queue.

The answer is yes, it is possible to build this on WAL, or table based
event logs/queue of londiste / slony. It just takkes a little extra
management on the receiving side to do the record locking and
distribution between cooperating consumers.
All we're discussing is moving a successful piece of software into
core, which has been discussed for years at the international
technical meetings we've both been present at. I think an open
viewpoint on the feasibility of that would be reasonable, especially
when it comes from one of the original designers.
When I ask you for technical clarification or bring up potential
problems with a 2Q feature, you consistently treat it as a personal
attack and are emotionally defensive instead of answering my technical
questions.  This, in turn, frustrates the heck out of me (and others)
because we can't get the technical questions answered.  I don't want you
to justify yourself, I want a clear technical spec.
Currently the "clear tech spec" is just this:

* works as table on INSERTS up to inserting logical WAL record describing the
insert but no data is inserted locally.

with all things that follow from the local table having no data
  - unique constraints don't make sense
  - indexes make no sense
  -  updates and deletes hit no data
  - etc. . .

I'm asking these questions because I'm excited about ReplicationII, and
I want it to be the best feature it can possibly be.

Or, as we tell many new contributors, "We wouldn't bring up potential
problems and ask lots of questions if we weren't interested in the feature."

Now, on to the technical questions:

QUEUE emphasizes the aspect of logged only table that it accepts
"records" in a certain order, persists these and then quarantees
that they can be read out in exact the same order - all this being
guaranteed by existing WAL mechanisms.

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.
While implementations vary, I think you'll find that the set of
operations required for a full-featured application queue are remarkably
similar across projects.  Personally, I've worked with celery, Redis,
AMQ, and RabbitMQ, as well as a custom solution on top of pgQ.  The
design, as you've described it, make several of these requirements
unreasonably convoluted to implement.
As Simon explained, the initial RFC was just  about not keeping the
data in local table if we know it will never be accessed (at leas not
for anything except vacuum and delete/truncate)

This is something that made no sense for physical replication .

It sounds to me like the needs of internal queueing and application
queueing may be hopelessly divergent.  That was always possible, and
maybe the answer is to forget about application queueing and focus on
making this mechanism work for replication and for matviews, the two
features we *know* we want it for.  Which don't need the application
queueing features I described AFAIK.

The two halves of the queue are the TAIL/entry point and the HEAD/exit
point. As you point out these could be on the different servers,
wherever the logical changes flow to, but could also be on the same
server. When the head and tail are on the same server, the MESSAGE
QUEUE syntax seems appropriate, but I agree that calling it that when
its just a head or just a tail seems slightly misleading.
Yeah, that's why I was asking for clarification; the way Hannu described
it, it sounded like it *couldn't* be read on the insert node, but only
on a replica.
Well, the reading is done the same way any WAL reading is done -
you subscribe to the stream and from that point on get the records
in LSN order.

It is very hard for me to tell for sure if walsender->walreceiver combo
 "reads the events" on master or slave side

We do, I think, want a full queue implementation in core. We also want
to allow other queue implementations to interface with Postgres, so we
probably want to allow "first half" only as well. Meaning we want both
head and tail separately in core code. The question is whether we
require both head and tail in core before we allow commit, to which I
would say I think adding the tail first is OK, and adding the head
later when we know exactly the design.
I'm just pointing out that some of the requirements of the design for
the replication queue may conflict with a design for a full-featured
application queue.

I don't quite follow you on what you mean by "head" vs. "tail".  Explain?
HEAD is the queue producer, where the events go in (any insert on master)

TAIL (to avoid another word) is where they come out
 (walreader -> walreceiver moving the events to slave)

Think of an analogy with a snake feeding on berries used by
an ant colony to get the nutrients in the berries to its nest :)

Ans there is no processing inside the snake - the work of
distributing said nutrients once they have arrived to the nest has
to be organised by the cooperative colony of ants on that end, the
snake just guarantees that the berries arrive in the same order they
get in.

I guess this organisation of works after the events are delivered is
what you were after when asking about "an application level queue".

Having said that, the LOGGING ONLY syntax makes me shiver. Better name?

I guess WRITE ONLY tables would get us more publicity would not be
entirely correct, as the data is readable from the log .


Hannu





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to