Slack digest for #general - 2019-08-13

Apache Pulsar Slack Tue, 13 Aug 2019 02:11:25 -0700

2019-08-12 13:46:48 UTC - jah: @jah has joined the channel
----
2019-08-12 13:49:49 UTC - Kim Christian Gaarder: I get an unexpected BookKeeper 
error when broker attempts to remove a consumer. Is this related to a known 
issue or should I report this as a bug?
----
2019-08-12 13:50:52 UTC - Alexandre DUVAL: @Sijie Guo I don't find what means 
already recycle error, do you know? Maybe @Matteo Merli?
----
2019-08-12 14:44:21 UTC - Sijie Guo: can you file a bug for it?
----
2019-08-12 14:45:52 UTC - jah: For multi-topic subscriptions, the docs mention 
that there are no ordering guarantees. Is it really the case the messages 
received are not ordered relative to the topic? My expectation was that I might 
receive messages from topics in a variety of orders, but that subject to a 
received topic, the messages would still be in order.
----
2019-08-12 14:50:10 UTC - Sijie Guo: in failover or exclusive subsccription, it 
is in partition based order.
+1 : jah
----
2019-08-12 14:50:18 UTC - Sijie Guo: in key-shared subscription, it is 
key-based order
----
2019-08-12 14:52:34 UTC - jah: Another question: I've read the docs on tiered 
storage and the offloading mechanism. Awesome.


Is there any support for re-onloading data from the cloud. For example, if I 
know I need to reread some very old data repeatedly for a period of time, there 
are scenarios where I would prefer to download from the cloud using my own 
tooling and then make those segments known to the system so that access is 
local. Eventually, I may offload again.
----
2019-08-12 14:54:36 UTC - jah: The idea is that the cloud is used for cold 
storage but there are situations when we need repeated access to that cold 
storage and doing that cold each time is too slow. So we would want to download 
it in bulk, process it as many times as needed, and then some point in the 
future, restore the offloaded state.
----
2019-08-12 15:04:56 UTC - Kim Christian Gaarder: sure, I’m working on 
reproducing this consistently. I’m able to reproduce it, but currently it’s 
hard to know what is causing it. I’ll submit a bug with code to reproduce as 
soon as I got something.
----
2019-08-12 16:15:26 UTC - David Kjerrumgaard: @jah There isn't any such 
capability now AFAIK, but that would be an interesting use case. Perhaps you 
can open a PIP request for this feature?
----
2019-08-12 17:02:22 UTC - Jacob Fugal: at my employer, we prefer declarative 
config as much as possible. e.g. terraform. I'm writing a terraform provider 
for pulsar resources (e.g. namespaces and topics) within a tenant. intent would 
be that it's eventually an official terraform provider (there doesn't appear to 
be one already that I could determine). would there be interest in having this 
be part of the pulsar project that I'm contributing to, rather than it being a 
separate project that I (or my employer, but still open source) fronts?
----
2019-08-12 17:02:43 UTC - Kim Christian Gaarder: @Sijie Guo 
<https://github.com/apache/pulsar/issues/4941>
----
2019-08-12 17:02:55 UTC - Jacob Fugal: I'm trying to decide where to put my 
initial commit :smile:
----
2019-08-12 17:14:47 UTC - Luke Lu: Hey guys, trying to figure out offloading vs 
retention policies. It appears (according to 
<https://github.com/apache/pulsar/blob/be7b24f9f8aa67b2235e523485249aef8d2a611a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2132>)
 that retention policies also applies to offloaded ledgers. I’d like to have 
some confirmation from maintainers.
----
2019-08-12 17:25:25 UTC - Luke Lu: On the topic of offloading. Is it possible 
to bootstrap a new pulsar cluster with an existing offloaded s3 bucket?
----
2019-08-12 17:30:26 UTC - Jon Bock: I believe you would need the ZooKeeper 
metadata as well for the cluster to know what the segments are.
----
2019-08-12 17:31:59 UTC - Luke Lu: Sure, I wonder if people has done something 
like this. Hence the “possible” question. I’d like to have this feature 
officially supported as this offer a much cheaper DR solution.
----
2019-08-12 17:34:49 UTC - Luke Lu: Basically an offload “snapshot” like feature 
to offload all necessary metadata (including those from zk).
----
2019-08-12 17:38:40 UTC - Sam Leung: Found 
<https://pulsar.apache.org/docs/en/cookbooks-deduplication/#message-deduplication-and-pulsar-clients-clients>
 Stating sendTimeout should be 0 so it’s “infinity”. Doesn’t 0 just mean fail 
immediately so it shouldn’t retry (which will get dropped by deduplication)
----
2019-08-12 17:50:44 UTC - Addison Higham: @Luke Lu just so I understand... in 
the event of a total failure of bookkeeper, you would be okay with messages 
lost from BK and instead be able to rebuild from what is retained in s3 and 
just would need the ability to snapshot the relevant state out of ZK?
----
2019-08-12 17:52:00 UTC - Addison Higham: I plan on using geo replication to do 
DR, but I still want to figure out some plans to have resiliency against an 
"oops I deleted a topic" to recover to some snapshot, so that *might* be a 
workable solution for me as well...
----
2019-08-12 17:55:31 UTC - Jon Bock: I’m not aware that anyone using Pulsar has 
implemented something like that yet.  There is one company who requested a 
snapshot recovery feature, the devs at Streamlio have been thinking about how 
that could be provided.  You may want to file a feature request to the project.
----
2019-08-12 18:47:35 UTC - Luke Lu: 
<https://github.com/apache/pulsar/issues/4942>
----
2019-08-12 18:51:57 UTC - Sam Leung: I want to rephrase a question I had before 
regarding message deduplication. I’m finding that because deduplication drops 
messages based on the the largest sequence id recorded pre-persist, if there’s 
an error persisting in BK, a retry attempt will just be “deduplicated” with no 
message ever getting persisted.
Is there some configuration or some concept I’m missing?
----
2019-08-12 18:53:15 UTC - Addison Higham: sounds like a bug to me...
----
2019-08-12 19:27:50 UTC - Poule: ```
pulsar-admin functions trigger --fqfn test/app/func1 --trigger-value yoshi

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException:
 Retrieve schema instance from schema info for type 'NONE' is not supported yet

Reason: HTTP 500 Internal Server Error
```
----
2019-08-12 19:28:23 UTC - Poule: what did I do wrong?
----
2019-08-12 19:59:57 UTC - David Kjerrumgaard: What command did you use to 
create the function? It looks like it is missing a schema
----
2019-08-12 20:01:41 UTC - Poule: I did not specify a schema in the Yaml
----
2019-08-12 20:02:58 UTC - Poule: In the Yaml I have only: `name, className, py, 
tenant, namespace, inputs, output`
----
2019-08-12 21:15:56 UTC - Addison Higham: hrm, wondering what the best way 
would be to build a per message exponential back-off consumer with pulsar. 
Example:
Let's say I am using Pulsar as a queue (multiple consumers in a shared 
subscription) and those consumers publish those messages to a webhook. If the 
webhook fails, I want to try up to 5 times with an exponential backoff that 
gets pretty long for the last retry (let's say 4 hours).

Options:
- I could sort of use nacks, but the re-delivery time is all static and can't 
be set per message. If I did my own tracking and used the 
`redeliverUnacknowledgedMessages` API (which is what nacks appear to do), I 
could control that with some granularity, but that forces me to have ackTimeout 
be longer then my max backoff, which leads to some weird behavior in the case 
of a consumer failing
- I could ack the message and then re-publish it to the topic after the timeout 
in the client, but then I lose pulsar message tracking and would have to 
implement my own metadata and retry tracking, likely not able to make use of 
pulsar dead-letter functionality

New functionality that could help with this:
if pulsar had the option to do a "visibility timeout" like AWS SQS, that is 
pretty ideal for these cases. I immediately respond per message with a timeout 
that is respected before it will be redelivered and all the state tracking is 
offloaded to the server. However, this may not fit well with the pulsar model, 
especially a failover/exclusive subscription.
----
2019-08-12 21:45:29 UTC - Ali Ahmed: @Addison Higham there is no simple answer 
but pulsar dead letter queue functionality will probably be useful here.
----
2019-08-12 21:49:51 UTC - Addison Higham: yeah, I am sort of thinking of doing 
an ackTimeout on my main queue of like 5-10 minutes so I can get 5 retries to 
happen over about 20 minutes, then the deadletter would have another consumer 
with a much longer ackTimeout and therefore much longer time I could wait to 
send a nack
----
2019-08-12 21:50:03 UTC - Ali Ahmed: makes sense
----
2019-08-12 23:33:32 UTC - Ali Ahmed: @Jacob Fugal You can create a pr for 
pulsar in the open source just put in a folder say terraform-scripts
----
2019-08-12 23:34:10 UTC - Ali Ahmed: @krishna you can take a look at this 
tutorial
<https://debezium.io/blog/2019/05/23/tutorial-using-debezium-connectors-with-apache-pulsar/>
----
2019-08-13 01:43:05 UTC - VDDCSS: @VDDCSS has joined the channel
----
2019-08-13 01:52:42 UTC - g891052195: @g891052195 has joined the channel
----
2019-08-13 03:15:47 UTC - Chitra Babu: @Chitra Babu has joined the channel
----
2019-08-13 08:00:48 UTC - Kim Christian Gaarder: Question about Pulsar SQL:
Given that a query like (SELECT __message_id__ FROM …) returns the string 
(167,2,0), what are the different parts of that string?
I know that 167 is the ledger-id, and I’m guessing that 2 is entry-id, but what 
is 0? is it batch-id? or is it partition-index? … and how can I construct a 
MessageId instance in java from these values?
----
2019-08-13 08:03:53 UTC - Sijie Guo: batch-slot-id
----
2019-08-13 08:05:25 UTC - Kim Christian Gaarder: Is the best way to get a 
MessageId from that then: new BatchMessageIdImpl(ledgerId, entryId, -1, 
batchIndex) ?
----
2019-08-13 08:05:49 UTC - Kim Christian Gaarder: is it correct to do 
partitionIndex = -1 when it’s a non-partitioned topic, or is that unrelated to 
this?
----
2019-08-13 08:08:29 UTC - Kim Christian Gaarder: Ok, so next question. When I 
do Consumer.seek(messageId) and that messageId was the one from the pulsar-sql, 
the behavior I see is that the next receive() call gets that message and not 
the message after, is this the intended behavior?
----
2019-08-13 08:10:03 UTC - Sijie Guo: &gt; is it correct to do partitionIndex = 
-1 when it’s a non-partitioned topic, or is that unrelated to this?
-1 is the non-partitioned topic.

&gt;  is this the intended behavior?

yes. it is inclusive.
----
2019-08-13 08:10:41 UTC - Kim Christian Gaarder: ok, so all is good then, and 
Pulsar behaves as it should, great thanks :slightly_smiling_face:
+1 : Sijie Guo
----
2019-08-13 08:51:38 UTC - Kim Christian Gaarder: I have a bug related to Pulsar 
SQL:
It appears that all messages except the very last published message is 
available for query in Pulsar SQL. Is this a known bug?
----
2019-08-13 08:54:21 UTC - Yuvaraj Loganathan: Yes this is known one
----

Slack digest for #general - 2019-08-13

Reply via email to