Re: [HACKERS] PG-MQ?

2007-06-20 Thread Chris Browne
[EMAIL PROTECTED] (Steve Atkins) writes:
 Is there any existing work out there on this?  Or should I maybe be
 looking at prototyping something?

 The skype tools have some sort of decent-looking publish/subscribe
 thing, PgQ, then they layer their replication on top of. It's multi
 consumer and producer, with delivered at least once semantics.

 Looks nice.

I had not really noticed that - I need to take a look at their
connection pooler too, so I guess that puts more skype items on my
ToDo list ;-).  Thanks for pointing it out...
-- 
let name=cbbrowne and tld=linuxdatabases.info in String.concat @ 
[name;tld];;
http://cbbrowne.com/info/advocacy.html
Signs of a Klingon Programmer #1: Our users will  know fear and cower
before our software. Ship it! Ship it and let  them flee like the dogs
they are!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen
On Wed, June 20, 2007 04:45, Chris Browne wrote:
 I'm seeing some applications where it appears that there would be
 value in introducing asynchronous messaging, ala message queueing.
 http://en.wikipedia.org/wiki/Message_queue

 The granddaddy of message queuing systems is IBM's MQ-Series, and I
 don't see particular value in replicating its functionality.

I'm quite interested in this.  Maybe I'm thinking of something too
complex, but I do think there are some oh it'll need to do that too
pitfalls that are best considered up front.

The big thing about MQ is that it participates as a resource manager in
two-phase commits (and optionally a transaction manager as well).  That
means that you get atomic processing steps: application takes message off
a queue, processes it, commits its changes to the database, replies to
message.  The queue manager then does a second-phase commit for all of
those steps, and that's when the reply really goes out.  If the
application fails, none of this will have happened so you get ACID over
the complete cycle.  That's something we should have free software for.

Perhaps the time is right for something new.  A lot of the complexity
inside MQ comes from data representation issues like encodings and
fixed-length strings, as I recall, and things have changed since MQ was
designed.  I agree it could be useful (and probably not hard either) to
have a transactional messaging system inside the database.  It saves you
from having to do two-phase commits.

But it does tie everything to postgres to some extent, and you lose the
interesting features—atomicity and assured, single delivery—as soon as
anything in the chain does anything persistent that does not participate
in the postgres transaction.  Perhaps what we really need is more mature
components, with a unified control layer on top.  That's how a lot of
successful free software grows.  See below.


 On the other side, the big names these days are:

 a) The Java Messaging Service, which seems to implement *way* more
options than I'm even vaguely interested in having (notably, lots
that involve data stores or lack thereof that I do not care to use);

Far as I know, JMS is an API, not a product.  You'd still slot some
messaging middleware underneath, such as MQ.  That is why MQSeries was
renamed: it fits into the WebSphere suite as the implementing engine
underneath the JMS API.  From what I understand MQ is one of the
best-of-breed products that JMS was designed around.  (Sun's term, bit
hypey for my taste).

In one way, Java is easy: the last thing you want to get into is yet
another marshaling standard.  There are plenty of standards to choose
from already, each married to one particular communications mechanism:
RPC, EDI, CORBA, D-Bus, XMLRPC, what have you.  Even postgres has its own.
 I'd say the most successful mechanism is TCP itself, because it isolates
itself from content representation so effectively.

It's hard not to get into marshaling: someone has to do it, and it's often
a drag to do it in the application, but the way things stand now *any*
choice limits the usefulness of what you're building.  That's something
I'd like to see change.

Personally I'd love to see marshaling or low-level data representation
isolated into a mature component that speaks multiple programming
languages on the one hand and multiple data representation formats on the
other.  Something the implementers of some of these messaging standards
would want to use to compose their messages, isolating their format
definitions into plugins.  Something that would make application writers
stop composing messages in finicky ad-hoc code that fails with unexpected
locales or has trouble with different line breaks.

If we had a component like that, combining it with existing transactional
variants of TCP and [S]HTTP might even be enough to build a usable
messaging system.  I haven't looked at them enough to know.  Of course
we'd need implementations of those protocols; see
http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/
for example.

Another box of important tools, and I have no idea where we stand with
this one, is transaction management.  We have 2-phase commit in postgres
now.  But do we have interoperability with existing transaction managers? 
Is there a decent free, portable, everything-agnostic transaction manager?
 With those, the sphere of reliability of a database-driven messaging
package could extend much further.

A free XA-capable filesystem would be great too, but I guess I'm daydreaming.


 There tend to be varying semantics out there:

 - Some queues may represent subscriptions where a whole bunch of
   listeners want to get all the messages;

The two simplest models that offer something more than TCP/UDP are 1:n
reliable publish-subscribe without persistence, and 1:1 request-reply with
persistent storage.  D-Bus does them both; IIRC MQ does 1:1 and has
add-ons on top for publish-subscribe.

I could imagine 

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Markus Schiltknecht

Hi Chris,

Chris Browne wrote:

I'm seeing some applications where it appears that there would be
value in introducing asynchronous messaging, ala message queueing.
http://en.wikipedia.org/wiki/Message_queue


ISTM that 'message queue' is a way too general term. There are hundreds 
of different queues at different levels on a standard server. So I'm 
somewhat unsure about what problem you want to solve.



c) There are lesser names, like isectd http://isectd.sf.net and the
(infamous?) Spread Toolkit which both implement memory-based messaging
systems.


If a GCS is about what you're looking for, then you also might want to 
consider these: ensemble, appia or jGroups. There's a Java layer called 
jGCS, which supports even more, similar systems.


Another commonly used term is 'reliable multicast', which guarantees 
that messages are delivered to a group of recipients. These algorithms 
often are the basis for a GCS.



My bias would be to have something that can basically run as a thin
set of stored procedures atop PostgreSQL :-).  It would be trivial to
extend that to support SOAP/XML-RPC, if desired.


Hm.. in Postgres-R I currently have (partial) support for ensemble and 
spread. Exporting that interface via stored procedures could be done, 
but you would probably need a manager process, as you certainly want 
your connections to persist across transactions (or not?).


Together with that process, we already have half of what Postgres-R is: 
an additional process which connects to the GCS. Thus I'm questioning, 
if there's value for exporting the interface. Can you think of other use 
cases than database replication? Why do you want to do that via the 
database, then, and not directly with the GCS?



It would be nice to achieve 'higher availability' by having queues
where you might replicate the contents (probably using the MQ system
itself ;-)) to other servers.


Uhm.. sorry, but I fail to see the big news here. Which replication 
solution does *not* work that way?


Regards

Markus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Marko Kreen

On 6/20/07, Jeroen T. Vermeulen [EMAIL PROTECTED] wrote:

On Wed, June 20, 2007 04:45, Chris Browne wrote:
 - Sometimes you have the semantics where:
   - messages need to be delivered at least once
   - messages need to be delivered no more than once
   - messages need to be delivered exactly once

IMHO, if you're not doing exactly once, or something very close to it,
you might as well stay with ad-hoc code.  You can ensure single delivery
by having the sender re-send when in doubt, and keeping track of
duplications in the recipient.


In case of PGQ, the at least once semantics is related to batch-based
processing it does - in case of failure, full batch is delivered again,
so if consumer had managed to process some of the items already, it gets
them double.

As it is responsponsible only for delivering events from database,
it has no way of guaranteeing exactly once behaviour, that needs
to be built on top of PGQ.

Simplest case would be if the events are processed in same database
that the queue resides.  Then you can just fetch, process, close batch
in one transaction and immidiately you get exactly once behaviour.

To achieve exactly once behaviour with different databases, look
at the pgq_ext module for sample.  Basically it just requires
storing batch_id/event_id on remote db and committing there first.
Later it can be checked if the batch/event is already processed.

It's tricky only if you want to achieve full transactionality for
event processing.  As I understand, JMS does not have a concept
of transactions, probably also other solutions mentioned before,
so to use PgQ as backend for them should be much simpler...

To Chris: you should like PgQ, its just stored procs in database,
plus it's basically just generalized Slony-I, with some optimizations,
so should be familiar territory ;)

--
marko

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Heikki Linnakangas

Marko Kreen wrote:

As I understand, JMS does not have a concept
of transactions, probably also other solutions mentioned before,
so to use PgQ as backend for them should be much simpler...


JMS certainly does have the concept of transactions. Both distributed 
ones through XA and two-phase commit, and local involving just one JMS 
provider. I don't know about others, but would be surprised if they didn't.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Marko Kreen

On 6/20/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

Marko Kreen wrote:
 As I understand, JMS does not have a concept
 of transactions, probably also other solutions mentioned before,
 so to use PgQ as backend for them should be much simpler...

JMS certainly does have the concept of transactions. Both distributed
ones through XA and two-phase commit, and local involving just one JMS
provider. I don't know about others, but would be surprised if they didn't.


Ah, sorry, my mistake then.  Shouldn't trust hearsay :)

--
marko

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen
On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote:
 Marko Kreen wrote:
 As I understand, JMS does not have a concept
 of transactions, probably also other solutions mentioned before,
 so to use PgQ as backend for them should be much simpler...

 JMS certainly does have the concept of transactions. Both distributed
 ones through XA and two-phase commit, and local involving just one JMS
 provider. I don't know about others, but would be surprised if they
 didn't.

Wait...  I thought XA did two-phase commit, and then there was XA+ for
*distributed* two-phase commit, which is much harder?


Jeroen



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Heikki Linnakangas

Jeroen T. Vermeulen wrote:

On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote:

Marko Kreen wrote:

As I understand, JMS does not have a concept
of transactions, probably also other solutions mentioned before,
so to use PgQ as backend for them should be much simpler...

JMS certainly does have the concept of transactions. Both distributed
ones through XA and two-phase commit, and local involving just one JMS
provider. I don't know about others, but would be surprised if they
didn't.


Wait...  I thought XA did two-phase commit, and then there was XA+ for
*distributed* two-phase commit, which is much harder?


Well, I meant distributed as in one transaction manager, multiple 
resource managers, all participating in a single atomic transaction. I 
don't know what XA+ adds on top of that.


To be precise, being a Java-thing, JMS actually supports two-phase 
commit through JTA (Java Transaction API), not XA. It's the same design 
and interface, just defined as Java interfaces instead of at native 
library level.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Rob Butler
Do you guys need something PG specific or built into PG?

ActiveMQ is very nice, speaks multiple languages, protocols and supports a ton 
of features.  Could you simply use that?

http://activemq.apache.org/

Rob



   

Get the free Yahoo! toolbar and rest assured with the added security of spyware 
protection.
http://new.toolbar.yahoo.com/toolbar/features/norton/index.php

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen
On Wed, June 20, 2007 19:42, Rob Butler wrote:
 Do you guys need something PG specific or built into PG?

 ActiveMQ is very nice, speaks multiple languages, protocols and supports a
 ton of features.  Could you simply use that?

 http://activemq.apache.org/

Looks very nice indeed!


Jeroen



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Marko Kreen

On 6/20/07, Rob Butler [EMAIL PROTECTED] wrote:

Do you guys need something PG specific or built into PG?


Yes, we need it usable from inside the DB, thus the PgQ.

That means the events are also transactional with other
things happening in the DB.


ActiveMQ is very nice, speaks multiple languages, protocols and supports a ton 
of features.  Could you simply use that?


I guess that if you need standalone message broker, the
ActiveMQ may be good choice.  At least, any solution that
avoids the database when passing messages should outperform
solutions that pipe stuff thru (general-purpose) database.

OTOH, if you _do_ need to transport the events via database
it should be very hard to outperform PgQ. :)  As it uses the
user-level xid/snapshot trick introduced by rserv/erserver/slony,
which is not possible with other databases other than PostgreSQL.

--
marko

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Chris Browne
[EMAIL PROTECTED] (Marko Kreen) writes:
 To Chris: you should like PgQ, its just stored procs in database,
 plus it's basically just generalized Slony-I, with some optimizations,
 so should be familiar territory ;)

Looks interesting...

Random ideas

- insert_event in C (way to get rid of plpython)

Yeah, I'm with that...  Ever tried building [foo] on AIX, where foo in
('perl', 'python', ...)???  :-(

It seems rather excessive to add in a whole stored procedure language
simply for one function...
-- 
(format nil [EMAIL PROTECTED] cbbrowne linuxdatabases.info)
http://www3.sympatico.ca/cbbrowne/sgml.html
I always try to do things in chronological order. 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PG-MQ?

2007-06-20 Thread Marko Kreen

On 6/20/07, Chris Browne [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] (Marko Kreen) writes:
 To Chris: you should like PgQ, its just stored procs in database,
 plus it's basically just generalized Slony-I, with some optimizations,
 so should be familiar territory ;)

Looks interesting...


Thanks :)


Random ideas

- insert_event in C (way to get rid of plpython)

Yeah, I'm with that...  Ever tried building [foo] on AIX, where foo in
('perl', 'python', ...)???  :-(

It seems rather excessive to add in a whole stored procedure language
simply for one function...


Well, it's standard in our installations as we use it for
other stuff too.  It's much easier to prototype in PL/Python
than in C...

As it has not been performance problem I have not bothered
to rewrite it.  But now the interface has been stable some
time, it could be done.

--
marko

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] PG-MQ?

2007-06-19 Thread Steve Atkins


On Jun 19, 2007, at 2:45 PM, Chris Browne wrote:


I'm seeing some applications where it appears that there would be
value in introducing asynchronous messaging, ala message queueing.
http://en.wikipedia.org/wiki/Message_queue


Me too.


My bias would be to have something that can basically run as a thin
set of stored procedures atop PostgreSQL :-).  It would be trivial to
extend that to support SOAP/XML-RPC, if desired.

It would be nice to achieve 'higher availability' by having queues
where you might replicate the contents (probably using the MQ system
itself ;-)) to other servers.

There tend to be varying semantics out there:

- Some queues may represent subscriptions where a whole bunch of
  listeners want to get all the messages;

- Sometimes you have the semantics where:
  - messages need to be delivered at least once
  - messages need to be delivered no more than once
  - messages need to be delivered exactly once

Is there any existing work out there on this?  Or should I maybe be
looking at prototyping something?


The skype tools have some sort of decent-looking publish/subscribe
thing, PgQ, then they layer their replication on top of. It's multi
consumer and producer, with delivered at least once semantics.

Looks nice.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings