I just took a look at the PR - it is, indeed, huge. But it is probably not
too hard to review as it is mostly fresh code. It is true that there hasn't
been a ton of work on RabbitMQ so maybe the reviewer isn't obvious. There's
3 committers on this thread who seem to have the expertise and
Alright, a bit late but this took me a while.
Thanks for all the input so far. I have rewritten much of the RabbitMq
IO connector and have it ready to go in a draft pr:
https://github.com/apache/beam/pull/10509
This should incorporate a lot of what's been discussed here, in terms of
On 11/14/19 9:50 PM, Daniel Robert wrote:
Alright, thanks everybody. I'm really appreciative of the conversation
here. I think I see where my disconnect is and how this might all work
together for me. There are some bugs in the current rabbit
implementation that I think have confused my
Immediately after a source, the window is the Global window, which means
you will get global deduplication.
On Thu, Nov 14, 2019 at 12:50 PM Daniel Robert
wrote:
> Alright, thanks everybody. I'm really appreciative of the conversation
> here. I think I see where my disconnect is and how this
Alright, thanks everybody. I'm really appreciative of the conversation
here. I think I see where my disconnect is and how this might all work
together for me. There are some bugs in the current rabbit
implementation that I think have confused my understanding of the
intended semantics. I'm
Just as a matter of curiosity, I wonder why it would be needed to assign
a (local) UUIDs to RabbitMQ streams. There seem to be only two options:
a) RabbitMQ does not support restore of client connection (this is
valid, many sources work like that, e.g. plain websocket, or UDP stream)
b) it
Hi Daniel,
On Wed, Nov 13, 2019 at 8:26 PM Daniel Robert wrote:
> I believe I've nailed down a situation that happens in practice that
> causes Beam and Rabbit to be incompatible. It seems that runners can and do
> make assumptions about the serializability (via Coder) of a CheckpointMark.
>
>
Just a thought: instead of embedding the RabbitMQ streams inside the
checkpoint mark, could you keep a global static map of RabbitMQ streams
keyed by a unique UUID. Then all you have to serialize inside the
CheckpointMark is the UUID; you can look up the actual stream in the
constructor of the
Hi,
answers inline.
On 11/14/19 4:15 PM, Daniel Robert wrote:
We may be talking past each other a bit, though I do appreciate the
responses.
Rabbit behaves a lot like a relational database in terms of state
required. A connection is analogous to a database connection, and a
channel (poor
We may be talking past each other a bit, though I do appreciate the
responses.
Rabbit behaves a lot like a relational database in terms of state
required. A connection is analogous to a database connection, and a
channel (poor analogy here) is similar to an open transaction. If the
Hi, as I said, I didn't dig too deep into that, but what I saw was [1].Generally, if RabbitMQ would have no way to recover subscription (which I don't think is the case), then it would not be incompatible with beam, but actually with would be incompatible any fault tolerant semantics.[1]
On 11/14/19 2:32 AM, Jan Lukavský wrote:
Hi Danny,
as Eugene pointed out, there are essentially two "modes of operation"
of CheckpointMark. It can:
a) be used to somehow restore state of a reader (in call to
UnboundedSource#createReader)
b) confirm processed elements in
Hi Danny,
as Eugene pointed out, there are essentially two "modes of operation" of
CheckpointMark. It can:
a) be used to somehow restore state of a reader (in call to
UnboundedSource#createReader)
b) confirm processed elements in CheckpointMark#finalizeCheckpoint
If your source doesn't
I believe I've nailed down a situation that happens in practice that
causes Beam and Rabbit to be incompatible. It seems that runners can and
do make assumptions about the serializability (via Coder) of a
CheckpointMark.
To start, these are the semantics of RabbitMQ:
- the client establishes
On Fri, Nov 8, 2019 at 5:57 AM Daniel Robert wrote:
> Thanks Euguene and Reuven.
>
> In response to Eugene, I'd like to confirm I have this correct: In the
> rabbit-style use case of "stream-system-side checkpointing", it is safe
> (and arguably the correct behavior) to ignore the supplied
Thanks Euguene and Reuven.
In response to Eugene, I'd like to confirm I have this correct: In the
rabbit-style use case of "stream-system-side checkpointing", it is safe
(and arguably the correct behavior) to ignore the supplied
CheckpointMark argument in `createReader(options,
Just to clarify one thing: CheckpointMark does not need to be Java
Seralizable. All that's needed is do return a Coder for the CheckpointMark
in getCheckpointMarkCoder.
On Thu, Nov 7, 2019 at 7:29 PM Eugene Kirpichov wrote:
> Hi Daniel,
>
> This is probably insufficiently well documented. The
Hi Daniel,
This is probably insufficiently well documented. The CheckpointMark is used
for two purposes:
1) To persistently store some notion of how much of the stream has been
consumed, so that if something fails we can tell the underlying streaming
system where to start reading when we
(Background: I recently upgraded RabbitMqIO from the 4.x to 5.x library.
As part of this I switched to a pull-based API rather than the
previously-used push-based. This has caused some nebulous problems so
put up a correction PR that I think needs some eyes fairly quickly as
I'd consider
19 matches
Mail list logo