Slack digest for #general - 2020-05-08

Apache Pulsar Slack Fri, 08 May 2020 02:11:25 -0700

2020-05-07 09:38:42 UTC - xue: pulsar sql error , pulsar verison 2.5.1
----
2020-05-07 10:16:52 UTC - Alexandre DUVAL: What did you put in the zookeepers 
servers configuration on catalog configuration?
----
2020-05-07 10:17:34 UTC - Alexandre DUVAL: 
<https://github.com/apache/pulsar/issues/6852|https://github.com/apache/pulsar/issues/6852>
----
2020-05-07 10:28:47 UTC - Andreas Cz: @Andreas Cz has joined the channel
----
2020-05-07 11:13:13 UTC - Gary Fredericks: okay, I think I asked the question 
poorly, I'll try again:


Assuming a very long retention policy, is there any benefit to having a backlog 
quota? or what is the risk in not having a backlog quota?
----
2020-05-07 11:16:45 UTC - dionjansen: @dionjansen has joined the channel
----
2020-05-07 11:23:28 UTC - dionjansen: Hi, I’m having an issue with one of my 
bookies. I have a cluster deployed on kubernetes and one of the bookies is 
reaching +90% disk utilisation. I set up managed offloading to s3, but the disk 
pressure doesn’t seem to be reducing (I see the ledgers being set to 
`offloaded: true` ). Do I need to do any additional actions to clear up the 
current ledgers on disk? Thanks
----
2020-05-07 11:30:42 UTC - alex kurtser: Hello everyone. We encountered with a 
new issue. The function/function definitions disappeared from pulsar.

We would like to figure out what happened.
Recently we have set the retention polocy ant message ttl setings to the short 
period ( 24 hours for retention and 1 h for message ttl) .
I suspect that we also  defined these settings on public/function namespace and 
that caused to function to be deleted after the retention time passed over.
Does anybody know whether  the functions are actually stored in the 
public/function namespaces and if it could be affected by retention policies on 
this namespace ?
----
2020-05-07 12:12:55 UTC - Frank Kelly: Is there any documentation on how to 
downscale on AWS - I presume there's some "data balancing" that has to go on 
but just wondering how does Pulsar do that data "balancing" and how AWS can 
figure out how/when to kill an EC2 instance that is no longer needed?
----
2020-05-07 12:23:53 UTC - Jeroen Coenders: @Jeroen Coenders has joined the 
channel
----
2020-05-07 13:04:07 UTC - Ming: Gary, these concepts are confusing. They are 
supposed to be independent but intertwined. Assuming a very long retention 
policy, you still need long enough backlog, because backlog prevents disk 
growing indefinitely. With no backlog, the producer can no longer writes to the 
topic or producer gets disconnected (remember there are 3 backlog policy reacts 
what if the size limit is reached?) The problem is the backlog size should not 
more than the disk size. Well if you can set the retention to be very long as 
the disk size, the backlog size should not be more than that.
----
2020-05-07 13:24:27 UTC - Damien Roualen: I don’t find a similar case in the 
channel:
When I am trying to get tables from Pulsar using Presto (like in the 
documentation) I get HTTP 500 error.

```presto:default&gt; show catalogs;
 Catalog
---------
 pulsar
 system
(2 rows)

presto:default&gt; show schemas in pulsar;
       Schema
--------------------
 information_schema
 public/default
 pulsar/system
(3 rows)

presto:default&gt; show tables in pulsar."public/default";
Query 20200507_125433_00004_h8fj8 failed: Failed to get tables/topics in 
public/default: HTTP 500 Internal Server Error```
----
2020-05-07 13:25:12 UTC - Damien Roualen: Is there a way to understand that 
HTTP 500 error from my Pulsar Cluster?
----
2020-05-07 13:25:55 UTC - Damien Roualen: I am using JWT token for the 
connection, which allows me to run the two first commands.
----
2020-05-07 13:27:01 UTC - Damien Roualen: Documentation link: 
<https://pulsar.apache.org/docs/en/sql-getting-started/>
----
2020-05-07 13:29:28 UTC - Alexandre DUVAL: Did you check brokers logs?
----
2020-05-07 13:30:05 UTC - Alexandre DUVAL: Or presto runner logs? Or try 
`pulsar sql --debug`
----
2020-05-07 13:31:00 UTC - Damien Roualen: We run Pulsar with a service using 
systemctl.
I checked with: `sudo systemctl status pulsar.broker.service`
----
2020-05-07 13:31:52 UTC - Alexandre DUVAL: so journalctl -efau 
pulsar.broker.service ?
----
2020-05-07 13:31:55 UTC - Damien Roualen: I am using Presto cli.
----
2020-05-07 13:32:11 UTC - Alexandre DUVAL: pulsar sql is a presto cli :wink:
----
2020-05-07 13:33:26 UTC - Damien Roualen: Ok, :sweat_smile:. I will run the 
presto cli with a similar debug parameter
----
2020-05-07 13:34:01 UTC - Alexandre DUVAL: The log showed by the presto cli 
should be detailed on the presto runner
----
2020-05-07 13:35:09 UTC - Damien Roualen: I have the exception trace now, I am 
looking at it if that can help.
----
2020-05-07 13:35:52 UTC - Damien Roualen: The CLI doesn’t give meaningful 
information about the issu.
----
2020-05-07 13:36:20 UTC - Damien Roualen: ```Query 20200507_133355_00009_h8fj8 
failed: Failed to get tables/topics in pulsar/system: HTTP 500 Internal Server 
Error
java.lang.RuntimeException: Failed to get tables/topics in pulsar/system: HTTP 
500 Internal Server Error
        at 
org.apache.pulsar.sql.presto.PulsarMetadata.listTables(PulsarMetadata.java:191)
        at 
com.facebook.presto.spi.connector.ConnectorMetadata.listTables(ConnectorMetadata.java:224)
        at 
com.facebook.presto.metadata.MetadataManager.listTables(MetadataManager.java:548)
        at 
com.facebook.presto.connector.informationSchema.InformationSchemaMetadata.lambda$calculatePrefixesWithTableName$7(InformationSchemaMetadata.java:304)
        at 
java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at 
java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at 
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at 
com.facebook.presto.connector.informationSchema.InformationSchemaMetadata.calculatePrefixesWithTableName(InformationSchemaMetadata.java:308)
        at 
com.facebook.presto.connector.informationSchema.InformationSchemaMetadata.getTableLayouts(InformationSchemaMetadata.java:242)```
----
2020-05-07 13:36:31 UTC - Alexandre DUVAL: with --debug the cli shows all the 
stacktrace iirc
----
2020-05-07 13:36:41 UTC - Damien Roualen: Yes, I use that parameter.
----
2020-05-07 13:36:42 UTC - Alexandre DUVAL: ok so it should be broker side, 
check its logs
----
2020-05-07 13:37:09 UTC - Damien Roualen: Since I am using three brokers, I 
should use journactl on each instances?
----
2020-05-07 13:39:38 UTC - Alexandre DUVAL: yes, or you can lookup to check 
which broker is currently running the topicused by your presto request
----
2020-05-07 13:41:20 UTC - Damien Roualen: How can I do the lookup?
----
2020-05-07 13:42:03 UTC - Damien Roualen: Otherwise I would update Presto 
server config to only use 1 broker and relaunch the Presto server,
----
2020-05-07 13:42:39 UTC - Damien Roualen: The logs don’t show anything from the 
3 brokers. Maybe there is a way to increase the level of verbosity.
----
2020-05-07 13:44:33 UTC - Alexandre DUVAL: maybe
```org.apache.pulsar.sql.presto.PulsarMetadata.listTables(PulsarMetadata.java:191)```
----
2020-05-07 13:44:43 UTC - Alexandre DUVAL: look for metadata on zookeepers?
----
2020-05-07 13:44:54 UTC - Alexandre DUVAL: you can check the code ran
----
2020-05-07 13:58:36 UTC - Damien Roualen: Ok; I am going to check the metadata.
----
2020-05-07 14:04:03 UTC - Gary Fredericks: thanks for the explanation; I will 
play around with this and see what I can figure out
----
2020-05-07 14:04:36 UTC - Gary Fredericks: (I also posted a different version 
of this question on the github issue: 
<https://github.com/apache/pulsar/issues/6847#issuecomment-625189109>)
----
2020-05-07 14:09:06 UTC - Damien Roualen: By running: `sudo journalctl -efau 
zookeeper.service`
----
2020-05-07 14:10:14 UTC - Damien Roualen: ```I see an error earlier but it 
doesn't seem to be related: ```
```May 07 12:28:46 ip-10-242-177-121 pulsar[4411]: WARNING: An illegal 
reflective access operation has occurred
May 07 12:28:46 ip-10-242-177-121 pulsar[4411]: WARNING: Illegal reflective 
access using Lookup on org.aspectj.weaver.loadtime.ClassLoaderWeavingAdaptor 
(file:/opt/pulsar/lib/org.aspectj-aspectjweaver-1.9.2.jar) to class 
java.lang.ClassLoader
May 07 12:28:46 ip-10-242-177-121 pulsar[4411]: WARNING: Please consider 
reporting this to the maintainers of 
org.aspectj.weaver.loadtime.ClassLoaderWeavingAdaptor
May 07 12:28:46 ip-10-242-177-121 pulsar[4411]: WARNING: Use 
--illegal-access=warn to enable warnings of further illegal reflective access 
operations
May 07 12:28:46 ip-10-242-177-121 pulsar[4411]: WARNING: All illegal access 
operations will be denied in a future release```
----
2020-05-07 14:32:04 UTC - rani: *Pulsar Component Version*: `2.5.1`
*Architecture*: Function workers running on k8s separately from remaining 
components that are managed on EC2 via ASGs. Function worker uses 
`pulsar-proxy` for communication with broker.

*Issue:* the function worker works as expected however it occasionally throws 
this error
```11:30:42.196 [cluster-service-coordinator-timer] ERROR 
org.apache.pulsar.functions.worker.MembershipManager - Failed to get status of 
coordinate topic <persistent://pulsar/functions/coordinate>
org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: 
HTTP 502 Bad Gateway
        at 
org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:204)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.5.1.jar:2.5.1]
        at 
org.apache.pulsar.client.admin.internal.TopicsImpl$9.failed(TopicsImpl.java:469)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.5.1.jar:2.5.1]
        at 
org.glassfish.jersey.client.JerseyInvocation$4.failed(JerseyInvocation.java:1030)
 ~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]
        at 
org.glassfish.jersey.client.JerseyInvocation$4.completed(JerseyInvocation.java:1017)
 ~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]
        at 
org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:227)
 ~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]
        at 
org.glassfish.jersey.client.ClientRuntime.access$200(ClientRuntime.java:85) 
~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]
        at 
org.glassfish.jersey.client.ClientRuntime$2.lambda$response$0(ClientRuntime.java:178)
 ~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]```
Any input would be appreciated
----
2020-05-07 15:00:44 UTC - David Kjerrumgaard: Thanks for the detailed 
explanation of the issue and filing the issue.
----
2020-05-07 15:05:42 UTC - David Kjerrumgaard: FWIW, I usually have my SerDe 
classes defined in a different project/JAR file and never had this issue.
----
2020-05-07 15:41:20 UTC - Sijie Guo: New case study blog post from Zhaopin 
about why and how they use Pulsar SQL for search log analysis. If you are 
interested in using Pulsar (Presto) SQL, please check it out. 
<https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/>
+1 : Konstantinos Papalias, Penghui Li, Simba Khadder, Luke Lu, David 
Kjerrumgaard
----
2020-05-07 15:56:20 UTC - Addison Higham: are you referring to autoscaling? 
bookkeepers don't lend themselves well to autoscaling, but brokers should be 
reasonable to do either just based on CPU or you could use a custom metric.

Brokers don't have any non-replaceable state  so you can just add or remove any 
as you need (but they do have caches, so you don't want to go crazy with it). 
You can read more about load balancing basics here: 
<https://pulsar.apache.org/docs/en/administration-load-balance/>. What that 
means on AWS though is that assuming a pretty even load across a number of 
topics, ideally no one broker is better suited than the others, so the choice 
*should* be arbitrary of which broker you terminate. Assuming you let a broker 
properly terminate, it will signal that all of it's bundles (the unit of load 
balancing) should be moved to other brokers. Even in the event of a 
non-graceful termination, the other brokers will see that there is a bundle 
that isn't scheduled and will pick it up, it may just take a bit longer
+1 : Frank Kelly
----
2020-05-07 17:14:09 UTC - Damien Roualen: I got the error from the broker log:
```14:52:39.152 [AsyncHttpClient-44-4] ERROR 
org.apache.pulsar.broker.admin.v2.NonPersistentTopics - [pulsar-role-admin] 
Failed to get list of topics under namespace public/default
java.util.concurrent.ExecutionException: 
org.apache.pulsar.client.admin.PulsarAdminException$NotAuthorizedException: 
Don't have permission to administrate resources on this tenant
...
Caused by: <http://javax.ws.rs|javax.ws.rs>.NotAuthorizedException: HTTP 401 
Unauthorized
        at 
org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1080)
 ~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]
        at 
org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:99)
 ~[org.glassfish.jersey.core-jersey-client-2.27.jar:?]```
I can also reproduce it with admin client.
`sudo ./bin/pulsar-admin topics list "pulsar/default`
or `sudo ./bin/pulsar-admin topics list "public/default"`
HTTP 500 Internal Server Error
I will continue to investigate and try to better understand.
----
2020-05-07 17:16:25 UTC - Alexandre DUVAL: did you configured your client.conf ?
----
2020-05-07 17:16:30 UTC - Alexandre DUVAL: used by pulsar-admin
----
2020-05-07 17:16:49 UTC - Damien Roualen: Yes, with JWT and pulsar-admin.
----
2020-05-07 17:17:34 UTC - Damien Roualen: I was reading here: 
<https://stackoverflow.com/questions/60385447/apache-pulsar-authorization-failed-on-topic-dont-have-permission-to-adminis>

“This error is occurring because you’re not using the correct token to connect, 
or the role associated with your token lacks sufficient permission. You will 
need to ensure that you are using the correct token (and that it has the right 
permissions) to enable you to connect.”

I am going to check that.
----
2020-05-07 17:17:37 UTC - Alexandre DUVAL: and the jwt is admin of the cluster 
which is hosting public/default?
----
2020-05-07 17:17:53 UTC - Alexandre DUVAL: (the cluster as pulsar resource)
----
2020-05-07 17:20:00 UTC - Damien Roualen: I am going to verify.
----
2020-05-07 17:35:54 UTC - Damien Roualen: jfyi It could take a bit of time, I 
am reading the documentation and learn about how the project has been 
configured.
----
2020-05-07 17:52:03 UTC - Andrei Canta: @Andrei Canta has joined the channel
----
2020-05-07 18:07:36 UTC - Alex Yaroslavsky: Hi, I have a several brokers and 
several separate function workers running in k8s. All setup to use TLS and 
certificates. And I'm trying to make pulsar-admin work seamlessly with a single 
admin-url. I have seen here a suggestion to put a proxy which should make this 
possible. So I have setup a proxy, gave it the admin role client certificate, 
and pointed admin-url to this proxy. But still all function related commands 
return:
Function worker service is not done initializing. Please try again in a little 
while.

Reason: HTTP 503 Service Unavailable
----
2020-05-07 18:11:03 UTC - Chris Bartholomew: Did you configure 
`functionWorkerWebServiceURL` and/or `functionWorkerWebServiceURLTLS` in the 
`proxy.conf` so I can find the standalone function workers?
----
2020-05-07 18:13:58 UTC - Alex Yaroslavsky: ye
----
2020-05-07 18:14:04 UTC - Alex Yaroslavsky: yes
root@pulsar-hidden-proxy-6dc5954d9b-bwdvt:/pulsar# cat conf/proxy.conf | grep 
http
#   <http://www.apache.org/licenses/LICENSE-2.0>
brokerWebServiceURL=<http://pulsar-broker:8080>
brokerWebServiceURLTLS=<https://pulsar-broker:8443>
functionWorkerWebServiceURL=<http://pulsar-function:6750>
functionWorkerWebServiceURLTLS=<https://pulsar-function:6751>
----
2020-05-07 18:14:43 UTC - Alex Yaroslavsky: and my admin-url is 
<https://pulsar-proxy:8443>
----
2020-05-07 18:19:30 UTC - Sijie Guo: @Alex Yaroslavsky which version of Pulsar 
are you running? Is it 2.5.1?
----
2020-05-07 18:20:07 UTC - Alex Yaroslavsky: its 2.5.0, plan to upgrade to 2.5.1 
in a week
----
2020-05-07 18:23:10 UTC - Sijie Guo: prior to 2.5.1, the proxy only works to 
proxy v1/v2 functions routes to the function worker. @Addison Higham fixed the 
issue and it is available in 2.5.1. So you might need to upgrade to 2.5.1 to 
make this approach work - <https://github.com/apache/pulsar/pull/6486>
----
2020-05-07 18:24:33 UTC - Alex Yaroslavsky: Cool, thanks! I'll try it out once 
I upgrade.
----
2020-05-07 18:34:01 UTC - Anshul Dhyani: Guys, I am trying out Pulsar locally 
specially pulsar functions, but sadly I can't run it locally as it looks like 
there is no pulsar admin for windows.
If it exists, It will be really helpful if someone can point out place to 
download pulsar admin for windows .
----
2020-05-07 18:42:03 UTC - Alex Yaroslavsky: You can write java code that uses 
PulsarAdmin, or run pulsar-admin in a docker/WSL.
----
2020-05-07 21:21:39 UTC - David Kjerrumgaard: Can somebody tell me how to get 
state storage working in standalone mode? I have tried adding 
`extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent`
 to the standalone.conf file (no luck) and starting pulsar with 
`/pulsar/bin/pulsar standalone --stream-storage-port 4181` neither of which 
works.  Any pointers on what I am doing wrong?
----
2020-05-07 21:24:06 UTC - Zoltan Lajos Kis: @Zoltan Lajos Kis has joined the 
channel
----
2020-05-08 00:19:45 UTC - Liam Clarke: Brilliant thanks :slightly_smiling_face:
----
2020-05-08 03:10:00 UTC - Jared: @Jared has joined the channel
----
2020-05-08 03:30:57 UTC - Jared: Hello, I'm getting started with Pulsar + 
Debezium (with Postgres) and have successfully followed the example shown here: 
<http://pulsar.apache.org/docs/en/io-cdc-debezium/> - but I have a problem, 
when it comes to testing messages using 'docker exec -it pulsar-postgresql 
/bin/bash' I get various control characters in the log - I've pasted a 
screenshot.  Does anyone know why those appear and how to properly work 
with/around them?
----
2020-05-08 03:35:34 UTC - hahaxiaowen: @hahaxiaowen has joined the channel
----
2020-05-08 03:43:20 UTC - baiyuqing: @baiyuqing has joined the channel
----
2020-05-08 04:17:08 UTC - Subba Gaddamadugu: @Subba Gaddamadugu has joined the 
channel
----
2020-05-08 06:48:26 UTC - dionjansen: It seems the bookie in question is is now 
completely full:

```06:43:46.006 [LedgerDirsMonitorThread] WARN  
org.apache.bookkeeper.bookie.LedgerDirsMonitor - LedgerDirsMonitor check 
process: All ledger directories are non writable
06:43:46.007 [LedgerDirsMonitorThread] ERROR 
org.apache.bookkeeper.util.DiskChecker - Space left on device 
data/bookkeeper/ledgers/current : 0, Used space fraction: 1.0 &gt; threshold 
0.95.```
I see that the other bookies have considerably reduced their data on disk (I 
presume because of the `offloadLedgerDeletionLagMs` came around) but this one 
did not perform the deletions. Do I need manual intervention to resolve this? 
Any help would be greatly appreciated!
----
2020-05-08 07:28:01 UTC - evir35: @evir35 has joined the channel
----

Slack digest for #general - 2020-05-08

Reply via email to