Slack digest for #general - 2020-01-04

Apache Pulsar Slack Sat, 04 Jan 2020 01:11:55 -0800

2020-01-03 09:16:42 UTC - Jonas Lindholm: @Jonas Lindholm has joined the channel
----
2020-01-03 09:19:44 UTC - vikash: Hello  All,
I  have  requirement  where  i  struck in  batch Producer  and   consumer
1)  i  want  to  use batching only  in  consumer  becuase our  source  Producer 
 is  actor based  model  where messages  may  be belongs  to  other actor in  
that  case its hard  to  consume
2) is  any solution  like  i  can  get  List of Messages when  starrting 
consuming  i  want  to use  it  in  .net
some thing like  this
<https://github.com/apache/ignite/blob/master/modules/rocketmq/src/main/java/org/apache/ignite/stream/rocketmq/RocketMQStreamer.java#L115>
----
2020-01-03 11:20:33 UTC - Vladimir Shchur: 1. In pulsar Batch is a separate 
entity that is created by producer and delivered right to the consumer. 
Batching is fully supported by Pulsar.Client .net library
2. After consumer is connecting to the topic it immediately starts ingesting 
messages and saving it to internal queue without waiting for Receive 
invocations. You can consume messages one by one and put in either your own 
list or your own queue, or consume using multiple threads to increase 
throughput, however loosing ordering guarantees. You can also leverage 
AcknowledgeCumulative to acknowledge a bulk of messages at once.
----
2020-01-03 12:11:18 UTC - Sugata: @Sugata has joined the channel
----
2020-01-03 14:01:50 UTC - Miguel Martins: @Miguel Martins has joined the channel
----
2020-01-03 14:26:48 UTC - Sukumar: @Sukumar has joined the channel
----
2020-01-03 16:19:06 UTC - David Kjerrumgaard: @Arun Kumar Pulsar does not 
currently have any integration with reactive, although it wouldn't be much work 
to implement a `PulsarReceiver` class that behaves in a similar fashion.
+1 : Arun Kumar
----
2020-01-03 16:22:30 UTC - Saumitra: @Saumitra has joined the channel
----
2020-01-03 16:24:08 UTC - David Kjerrumgaard: @Rajitha You should be fine 
having multiple topics being consumed by different Pulsar functions. The topics 
are used to segregate the data by type, and the Pulsar functions all run 
independently of one another to provide the processing. For now you can start 
out using the _process_ runtime to host your functions and later change to K8s 
as the number of functions increases.
+1 : Arun Kumar
----
2020-01-03 16:24:57 UTC - juraj: i'm trying to determine why my 
(non-partitioned) topic has a massive message *backlog* (resulting to the 
bookie running out of disk space), while *unacked* says `0`.
i'm acknowledging cumulatively from a consumer, which is on a failover 
subscription type on that topic.
data seems to be flowing ok in the app, it's just that they stay on the topic 
backlog.
pulsar 2.4.2
----
2020-01-03 16:27:17 UTC - David Kjerrumgaard: @juraj What is your message 
retention policy on that topic?
----
2020-01-03 16:29:46 UTC - juraj: @David Kjerrumgaard i had a 
`set-backlog-quota` with `--policy consumer_backlog_eviction` too big 
previously, set to 2 GB which i thought was for the whole namespace, but turns 
out it's per topic, so i decreased that to 600M now (which given the number of 
topics, should fit on the bookie drives).
But i hadn't run that fixed setup yet.
But my assumption is that even without any quotas set, if the topic messages 
are ACKed (on all subscriptions, which i only have 1), they should be simply 
removed, right?
----
2020-01-03 16:33:45 UTC - David Kjerrumgaard: @juraj Acked messages are 
retained based on the retention policy you set for the topic. This allows you 
to access these messages with the Reader interface, etc.  So, if you have a 2GB 
retention policy, then you are keeping 2GB worth of acked messages on each 
topic.
----
2020-01-03 16:34:05 UTC - David Kjerrumgaard: 
<https://pulsar.apache.org/docs/en/concepts-messaging/#message-retention-and-expiry>
----
2020-01-03 16:35:10 UTC - David Kjerrumgaard: 
<https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/>
----
2020-01-03 16:35:10 UTC - bfav44: @bfav44 has joined the channel
----
2020-01-03 16:35:35 UTC - juraj: @David Kjerrumgaard got it, so by setting a 
quota (any quota), i'm responsible for managing that quota to result in not 
overflowing the disks, if there's lots of topics
----
2020-01-03 16:36:05 UTC - juraj: but without setting it, the acked messages 
should be cleaned pretty fast (although not immediately as i had noticed)
----
2020-01-03 16:37:37 UTC - juraj: ^this, and the fact that the quota is 
per-topic, could be stated in the docs more clearly
----
2020-01-03 16:45:05 UTC - David Kjerrumgaard: @juraj True, there is a default 
retention policy of 10GB, but it is not clearly stated.
heavy_check_mark : juraj
----
2020-01-03 16:49:15 UTC - Danish Javed: @Danish Javed has joined the channel
----
2020-01-03 17:19:12 UTC - juraj: @David Kjerrumgaard i just checked, the 
retention policy for my namespace was zero time/zero size (no retention).
how doe the *backlog-quota* policy relate to the *retention* policy?
doesn't the *retention* policy control the _acknowledged_ messages, and the 
*backlog-quota* the _unacknowledged_ messages on the backlog?
----
2020-01-03 17:19:49 UTC - Matteo Merli: Yes
----
2020-01-03 17:20:06 UTC - juraj: bc i've run out of space when having no 
unacknowledged messages basically, which doesn't seem right
(having set backlog-quota to a high number, which however shouldn't have had an 
effect)
----
2020-01-03 17:20:31 UTC - Matteo Merli: Though the concept of 
"_unacknowledged"_ is: messages pushed to consumer but not yet acked
----
2020-01-03 17:21:00 UTC - Matteo Merli: the "backlog" (count and storage size) 
is what count for the backlog quota
----
2020-01-03 17:22:45 UTC - juraj: this is my situation when i run out of disk 
space:
retention: 0 space, 0 time (no retention)
backlog-quota: set to a high size
unacknowledged messages: 0
acknowledged but kept messages on topics: a lot
----
2020-01-03 17:23:11 UTC - Matteo Merli: what is the backlog number for the 
topic?
----
2020-01-03 17:23:32 UTC - Matteo Merli: (for the subscription) -- and the 
storage size for the topic
----
2020-01-03 17:24:48 UTC - juraj: i had backlog quota per topic as 2 GB, with 
300 topics this had run out of disk space, *but* given that this backlog quota 
should apply on *unacknowledged* messages, which i had none, this puzzles me
----
2020-01-03 17:26:55 UTC - juraj: (i run 4 bookies, each whith 50GB journal 
drive and 50GB ledger drive)
----
2020-01-03 17:29:16 UTC - Matteo Merli: can you post the stats of one of the 
topics?
----
2020-01-03 17:30:09 UTC - Matteo Merli: and, "*unacknowledged* messages" only 
refers to the in-flight messages, not the entirety of messages stored in backlog
----
2020-01-03 17:30:38 UTC - juraj: 
----
2020-01-03 17:31:29 UTC - Matteo Merli: backlog: 30 K
Storage size: 500MB
----
2020-01-03 17:31:37 UTC - Matteo Merli: the problem is the consumer is not 
consuming
----
2020-01-03 17:32:03 UTC - juraj: but the consumer says UNACKED: 0
----
2020-01-03 17:32:47 UTC - Matteo Merli: again: these are the in-flight
----
2020-01-03 17:32:53 UTC - Matteo Merli: the queue backlog is 30K
----
2020-01-03 17:33:12 UTC - juraj: so if the consumer reads a message, but 
doesn't ack it, that counts as 1


so if consumer doesn't even read a message, that won't show anywhere?
----
2020-01-03 17:33:39 UTC - juraj: the thing is, the data seemed to be flowing 
and the system was computing, so i think the consumers were reading and acking 
okay
----
2020-01-03 17:35:26 UTC - juraj: is it possible to pull more detailed stats on 
the topic, e.g. via the bin/pulsar-admin?
----
2020-01-03 17:35:36 UTC - juraj: to see how many msgs were actually processed
----
2020-01-03 17:38:09 UTC - juraj: internal stats:
```{
  "entriesAddedCounter" : 34735,
  "numberOfEntries" : 34735,
  "totalSize" : 652084104,
  "currentLedgerEntries" : 8056,
  "currentLedgerSize" : 150928344,
  "lastLedgerCreatedTimestamp" : "2019-12-20T14:53:27.232Z",
  "lastLedgerCreationFailureTimestamp" : "2019-12-20T17:00:07.126Z",
  "waitingCursorsCount" : 2,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "2372:8054",
  "state" : "ClosedLedger",
  "ledgers" : [ {
    "ledgerId" : 28,
    "entries" : 26680,
    "size" : 501174694,
    "offloaded" : false
  }, {
    "ledgerId" : 2372,
    "entries" : 8055,
    "size" : 150909410,
    "offloaded" : false
  } ],
  "cursors" : {
    "computer" : {
      "markDeletePosition" : "28:4572",
      "readPosition" : "28:4573",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 4573,
      "cursorLedger" : 41,
      "cursorLedgerLastEntry" : 0,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2019-12-20T10:43:43.489Z",
      "state" : "NoLedger",
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "properties" : { }
    },
    "pulsar.dedup" : {
      "markDeletePosition" : "2372:7319",
      "readPosition" : "2372:7320",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 34000,
      "cursorLedger" : 2207,
      "cursorLedgerLastEntry" : 8,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2019-12-20T14:47:13.22Z",
      "state" : "Open",
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "properties" : {
        "collector-5c8b495cb8-8bc8c-orderBook-itbit-XBTUSD" : 1576839505000,
        "collector-5c8b495cb8-lnv4v-orderBook-itbit-XBTUSD" : 1576839426000,
        "collector-65fbf4669-8xx87-orderBook-itbit-XBTUSD" : 1576840938000,
        "collector-65fbf4669-pt9f5-orderBook-itbit-XBTUSD" : 1576840862000,
        "collector-84d5df8474-2v4l4-orderBook-itbit-XBTUSD" : 1576857773000,
        "collector-84d5df8474-qxnk4-orderBook-itbit-XBTUSD" : 1576857774000
      }
    }
  }
}```
----
2020-01-03 17:42:55 UTC - juraj: messagesConsumedCounter" : 4573
^
so it seems that i only consumed 4.5k msgs out of the 30k

but unacked shows 0 perhaps becuase there's also the deduplication reading 
going on
----
2020-01-03 17:47:39 UTC - Matteo Merli: the unacked is per subscription
----
2020-01-03 17:48:16 UTC - Matteo Merli: the dedup cursor is fine and it's close 
to the end of topic (eg: it only gets moved every 1K entries by default)
----
2020-01-03 17:49:17 UTC - juraj: so i'm guessing my consumer may be too slow 
consuming from the topic? i'm going to try to check that out
----
2020-01-03 17:52:37 UTC - juraj: is there a fast way to delete a non-empty 
namespace?   e.g. do i need to delete each topic manually first, or is there a 
faster way?
----
2020-01-03 17:53:15 UTC - Matteo Merli: you can delete a subscriptions for all 
the topics in a namespace
----
2020-01-03 17:53:30 UTC - Matteo Merli: the topics will then get automatically 
deleted (if not being used)
----
2020-01-03 17:54:26 UTC - Matteo Merli: `pulsar-admin namespaces unsubscribe 
TENANT/NAMESPACE -s MY_SUB`
----
2020-01-03 17:55:23 UTC - juraj: thanks.   but i have a lot of subscriptions.
also it tells me that the subscription has active consumers, but the system has 
been w/o clients for a week now.
may relate to be out of disk space also
----
2020-01-03 18:59:15 UTC - juraj: this worked for me to clean all data in the 
namespace:
```for topic in $( bin/pulsar-admin topics list mytenant/mynamespace ); do \
bin/pulsar-admin topics terminate $topic; \
bin/pulsar-admin topics delete $topic --force; \
done```
* the --force param is undocumented
----
2020-01-03 19:04:03 UTC - Matteo Merli: ```bin/pulsar-admin topics terminate 
$topic; \```
You don't need to do that
----
2020-01-03 19:04:27 UTC - Matteo Merli: the `--force` let's you delete a topic 
even though producers and consumers are connected
----
2020-01-03 19:06:52 UTC - juraj: ^i had too many differently named 
subscriptions to go thru all of them manually
----
2020-01-03 19:07:25 UTC - juraj: i guess the terminate would help if the topic 
was currently active, to block the clients
----
2020-01-03 19:08:46 UTC - Matteo Merli: if you do `delete --force` the 
terminate shouldn't be necessary. You only do "terminate" when you want to make 
sure a topic is "sealed" (as in no more messages published) without deleting it
heavy_check_mark : juraj
----
2020-01-03 19:23:16 UTC - juraj: ok, this should be added to docs probably (are 
there any devops docs for pulsar?), on how to clean all data if there's a lot 
of subscriptions:
```for topic in $( bin/pulsar-admin topics list mytenant/mynamespace ); do \
bin/pulsar-admin topics delete $topic --force; \
done```
----
2020-01-03 19:25:03 UTC - juraj: i was able to delete all topics except 4 which 
are stuck on "Not enough non-faulty bookies available -  ledger=-1"
any ideas how to reset the bookies or smt similar?
*all bookies seem to be healthy btw, as successfully deleting all the other 
250+ topics would suggest + no errors in bookie logs
+1 : David Kjerrumgaard
----
2020-01-03 19:28:09 UTC - juraj: this is kinda important as it happens after 
running out of space, so it can happen to anyone
----
2020-01-03 21:05:30 UTC - Guilherme Perinazzo: any idea of when can we expect 
the 2.5.0 release?
----
2020-01-03 21:53:40 UTC - Endre Karlson: @Sijie Guo ^?
----
2020-01-03 22:21:47 UTC - Eugen: @Eugen has joined the channel
----
2020-01-03 22:32:57 UTC - Laran Evans: How can I serialize and send POJOs to 
Pulsar, with schema when the POJOs are from a 3rd party library? When I try to 
do it I get this exception: 
<http://org.apache.pulsar.shade.org|org.apache.pulsar.shade.org>.apache.avro.AvroRuntimeException:
 avro.shaded.com.google.common.util.concurrent.UncheckedExecutionException: 
<http://org.apache.pulsar.shade.org|org.apache.pulsar.shade.org>.apache.avro.AvroTypeException:
 Unknown type: T
----
2020-01-03 22:33:40 UTC - Laran Evans: I'm simply doing this:
```Schema&lt;Champion&gt; championSchema = JSONSchema.of(Champion.class);```
... where Champion is a class defined in a 3rd party library.
----
2020-01-03 22:52:33 UTC - juraj: any idea why initial proxy->broker lookups 
fail, and then succeed?
----
2020-01-03 23:37:24 UTC - David Kjerrumgaard: @juraj It depends on how you 
launch your brokers and proxies. Are they are started at the same time?
----
2020-01-04 00:08:44 UTC - juraj: this doesn't sound like a startup sequence 
issue because it happens/is reproducible at any time, when the cluster is up..
is there any way to debug the broker discovery by the proxy?
thinking_face : David Kjerrumgaard
----
2020-01-04 00:26:39 UTC - David Kjerrumgaard: The proxy uses data in ZK to get 
the brokers hostname/IP address.
----
2020-01-04 00:27:03 UTC - David Kjerrumgaard: start by making sure that 
metadata is correct ?
----
2020-01-04 05:04:21 UTC - Loghi: @Loghi has joined the channel
----
2020-01-04 08:48:18 UTC - juraj: 
----
2020-01-04 08:48:47 UTC - juraj: 
----
2020-01-04 08:49:20 UTC - juraj: ^ could you pls check if this is normal, or if 
this looks like an issue with the cluster?

because eventually they work, but the connection process is weird due to that 
error
----
2020-01-04 08:55:51 UTC - Sean Carroll: @Sean Carroll has joined the channel
----

Slack digest for #general - 2020-01-04

Reply via email to