Slack digest for #general - 2020-07-01

Apache Pulsar Slack Wed, 01 Jul 2020 02:11:54 -0700

2020-06-30 11:47:21 UTC - rani: *[Pulsar 2.6.0][Presto]*  Architecture: Presto 
communicating with Brokers over Broker loadbalancer.

Running the following presto query results in the following errors. _Note_:
running a similar query on another namespace with lesser data succeeds.

*Query*
```show tables in "myTenant/myNamespace";```
*Presto CLI Error*
```Query 20200630_114317_00004_t2nsz failed: Failed to get tables/topics in
myTenant/myNamespace: HTTP 500 Internal Server Error
java.lang.RuntimeException: Failed to get tables/topics in
myTenant/myNamespace: HTTP 500 Internal Server Error
at
org.apache.pulsar.sql.presto.PulsarMetadata.listTables(PulsarMetadata.java:191)
at
com.facebook.presto.metadata.MetadataManager.listTables(MetadataManager.java:432)```
*Broker Error*
```11:43:47.340 [pulsar-web-42-7] ERROR
org.apache.pulsar.broker.web.PulsarWebResource - [pulsar-role-broker] Failed to
check whether namespace bundle is owned
myTenant/myNamespace/0x80000000_0xc0000000
java.util.concurrent.TimeoutException: null
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
~[?:?]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) ~[?:?]
at
org.apache.pulsar.broker.namespace.NamespaceService.getWebServiceUrl(NamespaceService.java:231)```
Are there any parameters in need of tuning here?
----
2020-06-30 11:47:35 UTC - rani: @Sijie Guo, your expertise would be much
appreciated here
----
2020-06-30 13:35:09 UTC - Meyappan Ramasamy: hi team, i used apache pulsar for
message broker which had similar features to kafka. i wanted to know if dynamic
topic subscription using a regex pattern to subscribe to future topics is
supported in the latest version of pulsar 2.6.0
----
2020-06-30 13:35:39 UTC - Meyappan Ramasamy: hi team, i used apache pulsar for
message broker which had similar features to kafka. refer to this thread for
previous communication. i wanted to know if dynamic topic subscription using a
regex pattern to subscribe to future topics is supported in the latest version
of pulsar 2.6.0
----
2020-06-30 15:05:41 UTC - Matteo Merli: While the python client supports
schema, Pulsar functions in python do not (yet)
----
2020-06-30 15:09:04 UTC - rwaweber: Thanks David! That definitely helps, thanks
for the suggestion.

I suppose my earlier question ended up morphing into another one, where it
appears that the `pulsar_storage_size` metric is reporting double the occupied
storage reported by the `pulsar-admin topic stats-internal &lt;topic&gt;`
command.
----
2020-06-30 16:59:12 UTC - Joshua Eric: Ok thank you for your help!
----
2020-06-30 17:06:57 UTC - Sijie Guo: @rani - Have you verified your Pulsar
cluster first before using Presto?
----
2020-06-30 17:07:43 UTC - rani: verified in what sense @Sijie Guo?
`{{PULSAR_ENDPOINT}}/admin/v2/brokers/health` returns `200` and I have a bunch
of producers and consumers writing and reading from the cluster
----
2020-06-30 17:07:52 UTC - Sijie Guo: Regex subscription has been supported
starting from earlier version (like 2.3.0)
----
2020-06-30 17:09:24 UTC - Sijie Guo: In the presto server, have you tried to
run `bin/pulsar-admin topics list myTenant/myNamespace` ?
----
2020-06-30 17:10:31 UTC - Sijie Guo: The presto connector basically calls this
restful api to get the list of topics.
----
2020-06-30 17:13:34 UTC - rani: hmm, nope, I haven’t tried that! A few minutes
before you messaged everything started working as expected, but i’m not sure
exactly what I did for this to work.

I initially tried increasing `zooKeeperSessionTimeoutMillis` assuming that this
is causing the timeout, but that didn’t help and I reverted the change.

The only other change I did which I think is irrelevant is that I had a
“zombie” pulsar function running which i destroyed and now everything works as
expected from Presto end.

(by zombie function I mean it was a function that was created before I
re-created my pulsar cluster)
----
2020-06-30 17:14:31 UTC - rani: For good measure, i’m gonna do a fresh cluster
re-creation now and monitor to see if I can replicate this issue again
----
2020-06-30 17:33:53 UTC - Abhishek Varshney: @Matteo Merli
&gt; ordering will be (briefly) broken.
I assume the key-shared consumers would still be able to consume messages in
order without breaking any ordering when partitions are increased. // @Sijie Guo
----
2020-06-30 17:35:53 UTC - Matteo Merli: ordering is only guaranteed within a
partition.

When you increase partition the hashing logic will shift the partition
assignments, therefore a `K1` that was going to `partition-1` might now go to
`partition-12` .
----
2020-06-30 17:44:03 UTC - rani: i’ve re-created the cluster without issues! I
am able to query my topics via Presto. Everything works well.
----
2020-06-30 17:44:39 UTC - rani: Just still puzzled at how destroying an old
running function could have resolved this issue
----
2020-06-30 18:16:28 UTC - Rahul Vashishth: was it supposed to be published with
this <https://github.com/apache/pulsar-helm-chart/pull/21> merge.
----
2020-06-30 19:21:26 UTC - Raphael Enns: Hey, for this issue, it appeared to be
a disconnect from the PulsarClient. Currently we have one instance of
PulsarClient created and are using that. But I've noticed in some testing that
it can go down and it causes problems.
----
2020-06-30 19:22:11 UTC - Raphael Enns: The 2 cases I noticed was I've seen a
couple of out of memory errors, and those seem to stem from a disconnect.
----
2020-06-30 19:23:01 UTC - Raphael Enns: Another case is that we suddenly stop
receiving any messages, which seems like PulsarClient's listenier thread got
blocked or something.
----
2020-06-30 19:24:19 UTC - Raphael Enns: Do you have any recommendations for
handling the PulsarClient object? Is it supposed to reconnect as necessary on a
disconnect? Should we have some thread monitoring it to make sure it is
working? Should we be creating new PulsarClient instances more often?
----
2020-06-30 19:25:16 UTC - Raphael Enns: I'm just looking for recommendations
for how to run this stably. We have a long-lived service that sits on top of
Pulsar and right now we are seeing some stability issues.
----
2020-06-30 20:48:02 UTC - Matteo Merli: Yes, if there's a disconnection, the
client will internally attempt to reconnect, with exp backoff.

During the time it's disconnected, the producer will buffer up messages until
the queue gets full. By default, the queue size is 1K messages. To reduce the
amount of memory used, just reduce the `producerQueueSize` to a smaller number
(eg: 100 or 10).
----
2020-07-01 03:10:45 UTC - jixing7: @jixing7 has joined the channel
----
2020-07-01 04:37:00 UTC - sjmittal: thanks
----
2020-07-01 05:40:32 UTC - Luke Stephenson: I've got a question about the ledger
disk usage on the grafana dashboards. After a topic has been offloaded to S3,
on the bookie metrics when should that be reflected as lower "Ledgers Disk
Usage". I'm not seeing any changes to this metric over 12 hours after the
offloading finished (expecting it to go down from 50% to about 1%). I'm
confident the offloading worked as the S3 bucket has a lot of data in it. Plus
a full consumer read of the topic puts almost no load on the bookies . (When
on pulsar 2.5.1 I saw this metric drop after offloading, but I've since
upgraded to 2.6.0 and it is staying high).
----
2020-07-01 06:27:12 UTC - Sijie Guo: A couple of reasons:

• there is a `managedLedgerOffloadDeletionLagMs` that delays the deletion of
ledgers after they are offloaded to tiered storage.
• bookies cleaned up disks in a lazy manner via garbage collection and entry
log file compactions (the compaction is an internal compaction mechanism to
reclaim the disk space)
In order to check what is the issue, you can run:

```bin/pulsar-admin topics stats-internal```
to find the internal stats of a topic to see what ledgers are offloaded.

for those offloaded ledgers, you can use

```bin/bookkeeper shell ledgermetadata```
to check if the ledger exists or not.
ok_hand : Konstantinos Papalias
----
2020-07-01 07:07:06 UTC - Luke Stephenson: Thank you for that explanation
----

Slack digest for #general - 2020-07-01

Reply via email to