Re: RabbitMQ and CheckpointMark feasibility

2020-01-07 Thread Kenneth Knowles
I just took a look at the PR - it is, indeed, huge. But it is probably not too hard to review as it is mostly fresh code. It is true that there hasn't been a ton of work on RabbitMQ so maybe the reviewer isn't obvious. There's 3 committers on this thread who seem to have the expertise and

Re: RabbitMQ and CheckpointMark feasibility

2020-01-06 Thread Daniel Robert
Alright, a bit late but this took me a while. Thanks for all the input so far. I have rewritten much of the RabbitMq IO connector and have it ready to go in a draft pr: https://github.com/apache/beam/pull/10509 This should incorporate a lot of what's been discussed here, in terms of

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
On 11/14/19 9:50 PM, Daniel Robert wrote: Alright, thanks everybody. I'm really appreciative of the conversation here. I think I see where my disconnect is and how this might all work together for me. There are some bugs in the current rabbit implementation that I think have confused my

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Reuven Lax
Immediately after a source, the window is the Global window, which means you will get global deduplication. On Thu, Nov 14, 2019 at 12:50 PM Daniel Robert wrote: > Alright, thanks everybody. I'm really appreciative of the conversation > here. I think I see where my disconnect is and how this

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Daniel Robert
Alright, thanks everybody. I'm really appreciative of the conversation here. I think I see where my disconnect is and how this might all work together for me. There are some bugs in the current rabbit implementation that I think have confused my understanding of the intended semantics. I'm

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
Just as a matter of curiosity, I wonder why it would be needed to assign a (local) UUIDs to RabbitMQ streams. There seem to be only two options:  a) RabbitMQ does not support restore of client connection (this is valid, many sources work like that, e.g. plain websocket, or UDP stream)  b) it

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Eugene Kirpichov
Hi Daniel, On Wed, Nov 13, 2019 at 8:26 PM Daniel Robert wrote: > I believe I've nailed down a situation that happens in practice that > causes Beam and Rabbit to be incompatible. It seems that runners can and do > make assumptions about the serializability (via Coder) of a CheckpointMark. > >

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Reuven Lax
Just a thought: instead of embedding the RabbitMQ streams inside the checkpoint mark, could you keep a global static map of RabbitMQ streams keyed by a unique UUID. Then all you have to serialize inside the CheckpointMark is the UUID; you can look up the actual stream in the constructor of the

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
Hi, answers inline. On 11/14/19 4:15 PM, Daniel Robert wrote: We may be talking past each other a bit, though I do appreciate the responses. Rabbit behaves a lot like a relational database in terms of state required. A connection is analogous to a database connection, and a channel (poor

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Daniel Robert
We may be talking past each other a bit, though I do appreciate the responses. Rabbit behaves a lot like a relational database in terms of state required. A connection is analogous to a database connection, and a channel (poor analogy here) is similar to an open transaction. If the

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
Hi, as I said, I didn't dig too deep into that, but what I saw was [1].Generally, if RabbitMQ would have no way to recover subscription (which I don't think is the case), then it would not be incompatible with beam, but actually with would be incompatible any fault tolerant semantics.[1] 

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Daniel Robert
On 11/14/19 2:32 AM, Jan Lukavský wrote: Hi Danny, as Eugene pointed out, there are essentially two "modes of operation" of CheckpointMark. It can:  a) be used to somehow restore state of a reader (in call to UnboundedSource#createReader)  b) confirm processed elements in

Re: RabbitMQ and CheckpointMark feasibility

2019-11-13 Thread Jan Lukavský
Hi Danny, as Eugene pointed out, there are essentially two "modes of operation" of CheckpointMark. It can:  a) be used to somehow restore state of a reader (in call to UnboundedSource#createReader)  b) confirm processed elements in CheckpointMark#finalizeCheckpoint If your source doesn't

Re: RabbitMQ and CheckpointMark feasibility

2019-11-13 Thread Daniel Robert
I believe I've nailed down a situation that happens in practice that causes Beam and Rabbit to be incompatible. It seems that runners can and do make assumptions about the serializability (via Coder) of a CheckpointMark. To start, these are the semantics of RabbitMQ: - the client establishes

Re: RabbitMQ and CheckpointMark feasibility

2019-11-08 Thread Eugene Kirpichov
On Fri, Nov 8, 2019 at 5:57 AM Daniel Robert wrote: > Thanks Euguene and Reuven. > > In response to Eugene, I'd like to confirm I have this correct: In the > rabbit-style use case of "stream-system-side checkpointing", it is safe > (and arguably the correct behavior) to ignore the supplied

Re: RabbitMQ and CheckpointMark feasibility

2019-11-08 Thread Daniel Robert
Thanks Euguene and Reuven. In response to Eugene, I'd like to confirm I have this correct: In the rabbit-style use case of "stream-system-side checkpointing", it is safe (and arguably the correct behavior) to ignore the supplied CheckpointMark argument in `createReader(options,

Re: RabbitMQ and CheckpointMark feasibility

2019-11-07 Thread Reuven Lax
Just to clarify one thing: CheckpointMark does not need to be Java Seralizable. All that's needed is do return a Coder for the CheckpointMark in getCheckpointMarkCoder. On Thu, Nov 7, 2019 at 7:29 PM Eugene Kirpichov wrote: > Hi Daniel, > > This is probably insufficiently well documented. The

Re: RabbitMQ and CheckpointMark feasibility

2019-11-07 Thread Eugene Kirpichov
Hi Daniel, This is probably insufficiently well documented. The CheckpointMark is used for two purposes: 1) To persistently store some notion of how much of the stream has been consumed, so that if something fails we can tell the underlying streaming system where to start reading when we

RabbitMQ and CheckpointMark feasibility

2019-11-07 Thread Daniel Robert
(Background: I recently upgraded RabbitMqIO from the 4.x to 5.x library. As part of this I switched to a pull-based API rather than the previously-used push-based. This has caused some nebulous problems so put up a correction PR that I think needs some eyes fairly quickly as I'd consider