2020-01-03 09:16:42 UTC - Jonas Lindholm: @Jonas Lindholm has joined the channel
----
2020-01-03 09:19:44 UTC - vikash: Hello All,
I have requirement where i struck in batch Producer and consumer
1) i want to use batching only in consumer becuase our source Producer
is actor based model where messages may be belongs to other actor in
that case its hard to consume
2) is any solution like i can get List of Messages when starrting
consuming i want to use it in .net
some thing like this
<https://github.com/apache/ignite/blob/master/modules/rocketmq/src/main/java/org/apache/ignite/stream/rocketmq/RocketMQStreamer.java#L115>
----
2020-01-03 11:20:33 UTC - Vladimir Shchur: 1. In pulsar Batch is a separate
entity that is created by producer and delivered right to the consumer.
Batching is fully supported by Pulsar.Client .net library
2. After consumer is connecting to the topic it immediately starts ingesting
messages and saving it to internal queue without waiting for Receive
invocations. You can consume messages one by one and put in either your own
list or your own queue, or consume using multiple threads to increase
throughput, however loosing ordering guarantees. You can also leverage
AcknowledgeCumulative to acknowledge a bulk of messages at once.
----
2020-01-03 12:11:18 UTC - Sugata: @Sugata has joined the channel
----
2020-01-03 14:01:50 UTC - Miguel Martins: @Miguel Martins has joined the channel
----
2020-01-03 14:26:48 UTC - Sukumar: @Sukumar has joined the channel
----
2020-01-03 16:19:06 UTC - David Kjerrumgaard: @Arun Kumar Pulsar does not
currently have any integration with reactive, although it wouldn't be much work
to implement a `PulsarReceiver` class that behaves in a similar fashion.
+1 : Arun Kumar
----
2020-01-03 16:22:30 UTC - Saumitra: @Saumitra has joined the channel
----
2020-01-03 16:24:08 UTC - David Kjerrumgaard: @Rajitha You should be fine
having multiple topics being consumed by different Pulsar functions. The topics
are used to segregate the data by type, and the Pulsar functions all run
independently of one another to provide the processing. For now you can start
out using the _process_ runtime to host your functions and later change to K8s
as the number of functions increases.
+1 : Arun Kumar
----
2020-01-03 16:24:57 UTC - juraj: i'm trying to determine why my
(non-partitioned) topic has a massive message *backlog* (resulting to the
bookie running out of disk space), while *unacked* says `0`.
i'm acknowledging cumulatively from a consumer, which is on a failover
subscription type on that topic.
data seems to be flowing ok in the app, it's just that they stay on the topic
backlog.
pulsar 2.4.2
----
2020-01-03 16:27:17 UTC - David Kjerrumgaard: @juraj What is your message
retention policy on that topic?
----
2020-01-03 16:29:46 UTC - juraj: @David Kjerrumgaard i had a
`set-backlog-quota` with `--policy consumer_backlog_eviction` too big
previously, set to 2 GB which i thought was for the whole namespace, but turns
out it's per topic, so i decreased that to 600M now (which given the number of
topics, should fit on the bookie drives).
But i hadn't run that fixed setup yet.
But my assumption is that even without any quotas set, if the topic messages
are ACKed (on all subscriptions, which i only have 1), they should be simply
removed, right?
----
2020-01-03 16:33:45 UTC - David Kjerrumgaard: @juraj Acked messages are
retained based on the retention policy you set for the topic. This allows you
to access these messages with the Reader interface, etc. So, if you have a 2GB
retention policy, then you are keeping 2GB worth of acked messages on each
topic.
----
2020-01-03 16:34:05 UTC - David Kjerrumgaard:
<https://pulsar.apache.org/docs/en/concepts-messaging/#message-retention-and-expiry>
----
2020-01-03 16:35:10 UTC - David Kjerrumgaard:
<https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/>
----
2020-01-03 16:35:10 UTC - bfav44: @bfav44 has joined the channel
----
2020-01-03 16:35:35 UTC - juraj: @David Kjerrumgaard got it, so by setting a
quota (any quota), i'm responsible for managing that quota to result in not
overflowing the disks, if there's lots of topics
----
2020-01-03 16:36:05 UTC - juraj: but without setting it, the acked messages
should be cleaned pretty fast (although not immediately as i had noticed)
----
2020-01-03 16:37:37 UTC - juraj: ^this, and the fact that the quota is
per-topic, could be stated in the docs more clearly
----
2020-01-03 16:45:05 UTC - David Kjerrumgaard: @juraj True, there is a default
retention policy of 10GB, but it is not clearly stated.
heavy_check_mark : juraj
----
2020-01-03 16:49:15 UTC - Danish Javed: @Danish Javed has joined the channel
----
2020-01-03 17:19:12 UTC - juraj: @David Kjerrumgaard i just checked, the
retention policy for my namespace was zero time/zero size (no retention).
how doe the *backlog-quota* policy relate to the *retention* policy?
doesn't the *retention* policy control the _acknowledged_ messages, and the
*backlog-quota* the _unacknowledged_ messages on the backlog?
----
2020-01-03 17:19:49 UTC - Matteo Merli: Yes
----
2020-01-03 17:20:06 UTC - juraj: bc i've run out of space when having no
unacknowledged messages basically, which doesn't seem right
(having set backlog-quota to a high number, which however shouldn't have had an
effect)
----
2020-01-03 17:20:31 UTC - Matteo Merli: Though the concept of
"_unacknowledged"_ is: messages pushed to consumer but not yet acked
----
2020-01-03 17:21:00 UTC - Matteo Merli: the "backlog" (count and storage size)
is what count for the backlog quota
----
2020-01-03 17:22:45 UTC - juraj: this is my situation when i run out of disk
space:
retention: 0 space, 0 time (no retention)
backlog-quota: set to a high size
unacknowledged messages: 0
acknowledged but kept messages on topics: a lot
----
2020-01-03 17:23:11 UTC - Matteo Merli: what is the backlog number for the
topic?
----
2020-01-03 17:23:32 UTC - Matteo Merli: (for the subscription) -- and the
storage size for the topic
----
2020-01-03 17:24:48 UTC - juraj: i had backlog quota per topic as 2 GB, with
300 topics this had run out of disk space, *but* given that this backlog quota
should apply on *unacknowledged* messages, which i had none, this puzzles me
----
2020-01-03 17:26:55 UTC - juraj: (i run 4 bookies, each whith 50GB journal
drive and 50GB ledger drive)
----
2020-01-03 17:29:16 UTC - Matteo Merli: can you post the stats of one of the
topics?
----
2020-01-03 17:30:09 UTC - Matteo Merli: and, "*unacknowledged* messages" only
refers to the in-flight messages, not the entirety of messages stored in backlog
----
2020-01-03 17:30:38 UTC - juraj:
----
2020-01-03 17:31:29 UTC - Matteo Merli: backlog: 30 K
Storage size: 500MB
----
2020-01-03 17:31:37 UTC - Matteo Merli: the problem is the consumer is not
consuming
----
2020-01-03 17:32:03 UTC - juraj: but the consumer says UNACKED: 0
----
2020-01-03 17:32:47 UTC - Matteo Merli: again: these are the in-flight
----
2020-01-03 17:32:53 UTC - Matteo Merli: the queue backlog is 30K
----
2020-01-03 17:33:12 UTC - juraj: so if the consumer reads a message, but
doesn't ack it, that counts as 1
so if consumer doesn't even read a message, that won't show anywhere?
----
2020-01-03 17:33:39 UTC - juraj: the thing is, the data seemed to be flowing
and the system was computing, so i think the consumers were reading and acking
okay
----
2020-01-03 17:35:26 UTC - juraj: is it possible to pull more detailed stats on
the topic, e.g. via the bin/pulsar-admin?
----
2020-01-03 17:35:36 UTC - juraj: to see how many msgs were actually processed
----
2020-01-03 17:38:09 UTC - juraj: internal stats:
```{
"entriesAddedCounter" : 34735,
"numberOfEntries" : 34735,
"totalSize" : 652084104,
"currentLedgerEntries" : 8056,
"currentLedgerSize" : 150928344,
"lastLedgerCreatedTimestamp" : "2019-12-20T14:53:27.232Z",
"lastLedgerCreationFailureTimestamp" : "2019-12-20T17:00:07.126Z",
"waitingCursorsCount" : 2,
"pendingAddEntriesCount" : 0,
"lastConfirmedEntry" : "2372:8054",
"state" : "ClosedLedger",
"ledgers" : [ {
"ledgerId" : 28,
"entries" : 26680,
"size" : 501174694,
"offloaded" : false
}, {
"ledgerId" : 2372,
"entries" : 8055,
"size" : 150909410,
"offloaded" : false
} ],
"cursors" : {
"computer" : {
"markDeletePosition" : "28:4572",
"readPosition" : "28:4573",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 4573,
"cursorLedger" : 41,
"cursorLedgerLastEntry" : 0,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-12-20T10:43:43.489Z",
"state" : "NoLedger",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
},
"pulsar.dedup" : {
"markDeletePosition" : "2372:7319",
"readPosition" : "2372:7320",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 34000,
"cursorLedger" : 2207,
"cursorLedgerLastEntry" : 8,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-12-20T14:47:13.22Z",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : {
"collector-5c8b495cb8-8bc8c-orderBook-itbit-XBTUSD" : 1576839505000,
"collector-5c8b495cb8-lnv4v-orderBook-itbit-XBTUSD" : 1576839426000,
"collector-65fbf4669-8xx87-orderBook-itbit-XBTUSD" : 1576840938000,
"collector-65fbf4669-pt9f5-orderBook-itbit-XBTUSD" : 1576840862000,
"collector-84d5df8474-2v4l4-orderBook-itbit-XBTUSD" : 1576857773000,
"collector-84d5df8474-qxnk4-orderBook-itbit-XBTUSD" : 1576857774000
}
}
}
}```
----
2020-01-03 17:42:55 UTC - juraj: messagesConsumedCounter" : 4573
^
so it seems that i only consumed 4.5k msgs out of the 30k
but unacked shows 0 perhaps becuase there's also the deduplication reading
going on
----
2020-01-03 17:47:39 UTC - Matteo Merli: the unacked is per subscription
----
2020-01-03 17:48:16 UTC - Matteo Merli: the dedup cursor is fine and it's close
to the end of topic (eg: it only gets moved every 1K entries by default)
----
2020-01-03 17:49:17 UTC - juraj: so i'm guessing my consumer may be too slow
consuming from the topic? i'm going to try to check that out
----
2020-01-03 17:52:37 UTC - juraj: is there a fast way to delete a non-empty
namespace? e.g. do i need to delete each topic manually first, or is there a
faster way?
----
2020-01-03 17:53:15 UTC - Matteo Merli: you can delete a subscriptions for all
the topics in a namespace
----
2020-01-03 17:53:30 UTC - Matteo Merli: the topics will then get automatically
deleted (if not being used)
----
2020-01-03 17:54:26 UTC - Matteo Merli: `pulsar-admin namespaces unsubscribe
TENANT/NAMESPACE -s MY_SUB`
----
2020-01-03 17:55:23 UTC - juraj: thanks. but i have a lot of subscriptions.
also it tells me that the subscription has active consumers, but the system has
been w/o clients for a week now.
may relate to be out of disk space also
----
2020-01-03 18:59:15 UTC - juraj: this worked for me to clean all data in the
namespace:
```for topic in $( bin/pulsar-admin topics list mytenant/mynamespace ); do \
bin/pulsar-admin topics terminate $topic; \
bin/pulsar-admin topics delete $topic --force; \
done```
* the --force param is undocumented
----
2020-01-03 19:04:03 UTC - Matteo Merli: ```bin/pulsar-admin topics terminate
$topic; \```
You don't need to do that
----
2020-01-03 19:04:27 UTC - Matteo Merli: the `--force` let's you delete a topic
even though producers and consumers are connected
----
2020-01-03 19:06:52 UTC - juraj: ^i had too many differently named
subscriptions to go thru all of them manually
----
2020-01-03 19:07:25 UTC - juraj: i guess the terminate would help if the topic
was currently active, to block the clients
----
2020-01-03 19:08:46 UTC - Matteo Merli: if you do `delete --force` the
terminate shouldn't be necessary. You only do "terminate" when you want to make
sure a topic is "sealed" (as in no more messages published) without deleting it
heavy_check_mark : juraj
----
2020-01-03 19:23:16 UTC - juraj: ok, this should be added to docs probably (are
there any devops docs for pulsar?), on how to clean all data if there's a lot
of subscriptions:
```for topic in $( bin/pulsar-admin topics list mytenant/mynamespace ); do \
bin/pulsar-admin topics delete $topic --force; \
done```
----
2020-01-03 19:25:03 UTC - juraj: i was able to delete all topics except 4 which
are stuck on "Not enough non-faulty bookies available - ledger=-1"
any ideas how to reset the bookies or smt similar?
*all bookies seem to be healthy btw, as successfully deleting all the other
250+ topics would suggest + no errors in bookie logs
+1 : David Kjerrumgaard
----
2020-01-03 19:28:09 UTC - juraj: this is kinda important as it happens after
running out of space, so it can happen to anyone
----
2020-01-03 21:05:30 UTC - Guilherme Perinazzo: any idea of when can we expect
the 2.5.0 release?
----
2020-01-03 21:53:40 UTC - Endre Karlson: @Sijie Guo ^?
----
2020-01-03 22:21:47 UTC - Eugen: @Eugen has joined the channel
----
2020-01-03 22:32:57 UTC - Laran Evans: How can I serialize and send POJOs to
Pulsar, with schema when the POJOs are from a 3rd party library? When I try to
do it I get this exception:
<http://org.apache.pulsar.shade.org|org.apache.pulsar.shade.org>.apache.avro.AvroRuntimeException:
avro.shaded.com.google.common.util.concurrent.UncheckedExecutionException:
<http://org.apache.pulsar.shade.org|org.apache.pulsar.shade.org>.apache.avro.AvroTypeException:
Unknown type: T
----
2020-01-03 22:33:40 UTC - Laran Evans: I'm simply doing this:
```Schema<Champion> championSchema = JSONSchema.of(Champion.class);```
... where Champion is a class defined in a 3rd party library.
----
2020-01-03 22:52:33 UTC - juraj: any idea why initial proxy->broker lookups
fail, and then succeed?
----
2020-01-03 23:37:24 UTC - David Kjerrumgaard: @juraj It depends on how you
launch your brokers and proxies. Are they are started at the same time?
----
2020-01-04 00:08:44 UTC - juraj: this doesn't sound like a startup sequence
issue because it happens/is reproducible at any time, when the cluster is up..
is there any way to debug the broker discovery by the proxy?
thinking_face : David Kjerrumgaard
----
2020-01-04 00:26:39 UTC - David Kjerrumgaard: The proxy uses data in ZK to get
the brokers hostname/IP address.
----
2020-01-04 00:27:03 UTC - David Kjerrumgaard: start by making sure that
metadata is correct ?
----
2020-01-04 05:04:21 UTC - Loghi: @Loghi has joined the channel
----
2020-01-04 08:48:18 UTC - juraj:
----
2020-01-04 08:48:47 UTC - juraj:
----
2020-01-04 08:49:20 UTC - juraj: ^ could you pls check if this is normal, or if
this looks like an issue with the cluster?
because eventually they work, but the connection process is weird due to that
error
----
2020-01-04 08:55:51 UTC - Sean Carroll: @Sean Carroll has joined the channel
----