Slack digest for #general - 2020-02-27

Apache Pulsar Slack Thu, 27 Feb 2020 01:11:28 -0800

2020-02-26 09:21:27 UTC - Sijie Guo: Unfortunately no for standalone. What is 
the purpose of setting separate directory for standalone?
----
2020-02-26 09:23:51 UTC - Antti Kaikkonen: I would just like to see which files 
are journal, how much space they take etc... maybe test performance when 
setting the ledger to use to use HDD and journal to use SSD. But I guess I have 
to set up a single node cluster to do that.
----
2020-02-26 09:51:32 UTC - Sijie Guo: you can still see different directories 
under data/bookkeeper
----
2020-02-26 09:51:50 UTC - Sijie Guo: and you can mount different disks to 
different directories.
----
2020-02-26 10:25:35 UTC - ikeda: @ikeda has joined the channel
----
2020-02-26 11:00:24 UTC - Balasubramanian Viswanathan: @Balasubramanian 
Viswanathan has joined the channel
----
2020-02-26 11:12:53 UTC - Rajesh: @Rajesh has joined the channel
----
2020-02-26 11:13:18 UTC - Prashant  Shandilya: @Prashant  Shandilya has joined 
the channel
----
2020-02-26 11:21:50 UTC - Rajesh: How to check proxy service is running? in my 
LB shows outofservice.
----
2020-02-26 11:49:22 UTC - Santosh M K: @Santosh M K has joined the channel
----
2020-02-26 12:36:09 UTC - Manuel Mueller: &gt; whatever hipster term we use now
I laughed hard.
----
2020-02-26 12:37:58 UTC - Zach C: @Zach C has joined the channel
----
2020-02-26 12:43:57 UTC - Zach C: Hi @Sijie Guo - We're not actually seeing any 
errors when sending, just that pulsar is handling the Avro schema completely 
different from what we expected.


The issue we're running into is that one of our Avro objects is a union of 
other avro objects, and is defined as the follows:

```{
      "name": "batch",
      "type": [
        "null",
        {
          "type": "array",
          "items": [
            "com.company.avro.SensorRec",
            "com.company.avro.DeviceRec"
          ]
        }
      ],
      "default": null,
      "doc": ""
    }```
However, when pulsar is generating the schema for this, it's simplifying the 
object down to:
```{
              "name": "batch",
              "type": [
                "null",
                {
                  "type": "array",
                  "items": {
                    "type": "record",
                    "name": "Object",
                    "namespace": "java.lang",
                    "fields": []
                  },
                  "java-class": "java.util.List"
                }
              ],
              "default": null
            }```
----
2020-02-26 12:54:01 UTC - Zach C: Attached are 3 sample files:
1. avro_schema -  the simplified avro schemas
2. pulsar_simplified_object - the pulsar generated schema
3. kafka_simplified_object - the confluent generated schema (this is what we 
expected pulsar would create)
----
2020-02-26 13:17:56 UTC - eilonk: @eilonk has joined the channel
----
2020-02-26 13:28:37 UTC - eilonk: Hi all, deployed a pulsar cluster using helm 
on kubernetes and the native chart with slight modifications. when i 
port-forward to the manager it shows an empty page and asks me to add an 
environment. any reason why it doesn't connect to the cluster properly? what 
are some requirements i might be missing?
----
2020-02-26 13:55:11 UTC - Rolf Arne Corneliussen: Thanks.  Is there any 
practical limit on the number of topics to have in single namespace? (From what 
I have read you may have to adjust ZooKeeper `jute.maxbuffer` to allow  
`getChildren` to complete).
----
2020-02-26 13:57:57 UTC - Rolf Arne Corneliussen: Also, I have noted that when 
creating over 50k topics, the Pulsar broker seems to be working more with 
(continuous) load balancing.
----
2020-02-26 14:35:16 UTC - Rolf Arne Corneliussen: *Topic Compaction*. I am 
trying for figure out how this works on Pulsar.
I have created a namespace and set compaction threshold
```bin/pulsar-admin namespaces set-compaction-threshold -t 50M 
tenant/namespace```
policies report "compaction_threshold" : 52428800

Then I have run a producer 11 times, in each iteration the `Producer` sends X 
messages with keys from a fixed set (similar to the stock ticker example).
`producer.newMessage().key(keyIterator.next()).value(...)`

After one iteration, the pulsar-admin topics stats reports 49MB `storageSize`, 
and after 11 iterations it reports

``` "storageSize" : 494_039_536,
 "backlogSize" : 494_039_536,```
So, the compaction should be at work here?

If a try to trigger compaction manually,
```bin/pulsar-admin topics compact <persistent://tenant/namespace/topic>
Compaction already in progress```
I created a Reader:

```client.newReader()
    .topic("<persistent://tenant/namespace/topic>")
    .readCompacted(true)
    .startMessageId(MessageId.earliest)
    .startMessageIdInclusive()
    .readerListener(scanner)
    ...```
But as far as I can tell, it reads all X * 11 messages (i.e. the whole 
history). I would have hoped to read a compacted version of the topic, any 
suggestions of what I can do to achieve that?
----
2020-02-26 15:08:12 UTC - Ian: @Ian has joined the channel
----
2020-02-26 16:20:38 UTC - Santiago Del Campo: Hello!

We're having the following issue: Suddenly our bookkeeper it's responding with 
this exceptions:

First time having this errors actually. Some producers throw a timeout but the 
rest of them work just fine. Any idea of the cause and how to fix it?

```16:12:27.127 [bookkeeper-io-16-2] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to 
bookie: [id: 0x51570dc1]/172.31.49.0:3181, current state CONNECTING :
io.netty.channel.ConnectTimeoutException: connection timed out: 
/172.31.49.0:3181
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)
 [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) 
[io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:150) 
[io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
 [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
 [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518) 
[io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
 [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
[io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
16:12:27.128 [ReplicationWorker] ERROR 
org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker failed 
to replicate Ledger : 3 for 8 number of times, so deferring the ledger lock 
release by 300000 msecs
16:13:06.987 [bookkeeper-io-16-1] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to 
bookie: [id: 0x58baa6f7]/172.31.49.0:3181, current state CONNECTING :
io.netty.channel.ConnectTimeoutException: connection timed out: 
/172.31.49.0:3181```
----
2020-02-26 17:48:41 UTC - Greg Gallagher: looks like your bookie @  172.31.49 
is not reachable from wherever that log is being generated ... can you check 
the logs on it ?
----
2020-02-26 17:51:31 UTC - Alexander Ursu: Hi, recently made a setup for a 
Pulsar cluster in Docker Swarm which can be found at my comment on the github 
issue 
<https://github.com/apache/pulsar/issues/6264#issuecomment-591050063|here>. 
Would like to hear any thoughts and opinions from people here!
----
2020-02-26 17:57:53 UTC - Santiago Del Campo: MMm.... but that would weird... 
that IP it's the IP of the bookie server... and it's completely reachable. In 
fact, the cluster itself did not fail... it'still receiving productions and 
consumptions from the clients.
----
2020-02-26 17:59:43 UTC - Santiago Del Campo: Even weirder, the clients 
producing that were receiving timeouts before, were rebooted and the timeouts 
stopped, but those exceptions related to ledgers unable to be replicated 
persist.
----
2020-02-26 18:14:14 UTC - Santiago Del Campo: The cluster it's on top of K8S 
and all the pulsar components live within one same server.... could be related 
to overhead?
----
2020-02-26 20:23:17 UTC - Eugen: I'm seeing this warning when starting pulsar 
standalone:
```05:20:24.651 [main] WARN  org.apache.distributedlog.impl.BKNamespaceDriver - 
Could not use Netty Epoll event loop for bookie server:
java.lang.NoClassDefFoundError: Could not initialize class 
io.netty.channel.epoll.EpollEventLoop
        at 
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:142)
 ~[io.netty-netty-transport-native-epoll-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:35)
 ~[io.netty-netty-transport-native-epoll-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.&lt;init&gt;(MultithreadEventExecutorGroup.java:84)
 ~[io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.&lt;init&gt;(MultithreadEventExecutorGroup.java:58)
 ~[io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.&lt;init&gt;(MultithreadEventExecutorGroup.java:47)
 ~[io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.channel.MultithreadEventLoopGroup.&lt;init&gt;(MultithreadEventLoopGroup.java:59)
 ~[io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.&lt;init&gt;(EpollEventLoopGroup.java:104)
 ~[io.netty-netty-transport-native-epoll-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.&lt;init&gt;(EpollEventLoopGroup.java:91)
 ~[io.netty-netty-transport-native-epoll-4.1.43.Final.jar:4.1.43.Final]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.&lt;init&gt;(EpollEventLoopGroup.java:68)
 ~[io.netty-netty-transport-native-epoll-4.1.43.Final.jar:4.1.43.Final]
        at 
org.apache.distributedlog.impl.BKNamespaceDriver.getDefaultEventLoopGroup(BKNamespaceDriver.java:257)
 [org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]
        at 
org.apache.distributedlog.impl.BKNamespaceDriver.initializeBookKeeperClients(BKNamespaceDriver.java:268)
 [org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]
        at 
org.apache.distributedlog.impl.BKNamespaceDriver.initialize(BKNamespaceDriver.java:206)
 [org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]
        at 
org.apache.distributedlog.api.namespace.NamespaceBuilder.build(NamespaceBuilder.java:239)
 [org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]
        at 
org.apache.pulsar.functions.worker.WorkerService.start(WorkerService.java:105) 
[org.apache.pulsar-pulsar-functions-worker-2.5.0.jar:2.5.0]
        at 
org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1108)
 [org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:505) 
[org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0]
        at org.apache.pulsar.PulsarStandalone.start(PulsarStandalone.java:318) 
[org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0]
        at 
org.apache.pulsar.PulsarStandaloneStarter.main(PulsarStandaloneStarter.java:119)
 [org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0]```
What does this mean? Will pulsar run slower for me because of this?
----
2020-02-26 20:50:59 UTC - Ravi Shah: How to pass pulsar backlog message metric 
to HPA to scale Kubernetes pods?
----
2020-02-26 20:51:12 UTC - Ravi Shah: Is there any pulsar Prometheus adapter for 
custom metric which i can pass to HPA? 
----
2020-02-26 21:57:35 UTC - Greg Methvin: looks like this bug: 
<https://github.com/apache/pulsar/issues/6330>
+1 : Eugen
----
2020-02-26 23:51:31 UTC - Eugen: And I just found out that what the pulsar 
documentation is documenting (journalDirectory) is the legacy name, whereas 
journalDirectories is the new name . Anyways, both should work - but not with 
standalone - so it's unrelated to your problem anyway...
+1 : Antti Kaikkonen
----
2020-02-27 00:56:10 UTC - Eugen: The 
<https://github.com/openmessaging/openmessaging-benchmark/blob/master/driver-pulsar/deploy/templates/bookkeeper.conf#L26|openmessaging
 benchmark> uses multiple journal directories on the same SSD:
```# Use multiple journals to better exploit SSD throughput
journalDirectories=/mnt/journal/1,/mnt/journal/2,/mnt/journal/3,/mnt/journal/4```
I wonder if there is any rule of thumb for SSDs in terms of how much 
parallelism to use for optimal performance?
----
2020-02-27 03:48:45 UTC - sindhushree: @Addison Higham
----
2020-02-27 03:49:37 UTC - sindhushree: Can u please share how u go the root 
context am facing the same issue
----
2020-02-27 04:07:39 UTC - Sijie Guo: You can check `/metrics` http path to 
verify if your proxy service is running.
----
2020-02-27 04:08:08 UTC - Sijie Guo: how do you add the cluster to pulsar 
manager?
----
2020-02-27 04:09:53 UTC - Sijie Guo: Awesome. Do you want to contribute this 
setup as the documentation back to the community?
----
2020-02-27 04:11:56 UTC - Sijie Guo: SSD’s performance varies between vendors. 
A good suggestion is to do a benchmark on what is the throughput for a single 
directory on SSD. then caculate the number directories based on the bandwidth 
capacity for your SSD.
+1 : Eugen
----
2020-02-27 04:14:22 UTC - Sijie Guo: Another way to think out that is one 
journal thread is used one journal directory. It is good to align the number of 
directories with your CPU cores.
----
2020-02-27 04:14:34 UTC - Sijie Guo: That’s a simple rule of thumb.
----
2020-02-27 04:16:41 UTC - Eugen: something like this: `Math.min(cpu cores, 
bandwidth spec / single-dir bandwidth)` ?
----

Slack digest for #general - 2020-02-27

Reply via email to