Slack digest for #general - 2019-11-29

Apache Pulsar Slack Fri, 29 Nov 2019 01:11:41 -0800

2019-11-28 12:36:33 UTC - Nicolas Ha: Also it seems both pulsar dashboard and 
pulsar-manager query the broker API periodically but then do store the data in 
a DB for themselves. What could be the reason? The rest API is not meant to 
handle load, or something else?
----
2019-11-28 12:43:25 UTC - tuteng: It is only for aggregation, paging and 
filtering during front-end display.
----
2019-11-28 12:45:56 UTC - tuteng: If you have load balance, such as Nginx, 
please set it to its ip address.
If you started it locally using the `npm run dev` command, you can set it to 
`127.0.0.1` ip address or your local ip address.
----
2019-11-28 12:46:51 UTC - Nicolas Ha: I see, that makes sense thank you
----
2019-11-28 12:47:16 UTC - tuteng: If you use docker to start locally, you can 
set it as your local address.
----
2019-11-28 12:48:07 UTC - Nicolas Ha: My local address would be `127.0.0.1` 
right? That did not seem to work for me:
```  pulsar_manager:
    container_name: turtlequeue_pulsar_manager
    image: apachepulsar/pulsar-manager:v0.1.0
    depends_on:
      - pulsarstandalone
    ports:
      - "9527:9527"
    environment:
      - REDIRECT_HOST=127.0.0.1 # what should that be ?
      - REDIRECT_PORT=9527
      - DRIVER_CLASS_NAME=org.postgresql.Driver
      - URL='jdbc:<postgresql://127.0.0.1:5432/pulsar_manager>'
      - USERNAME=pulsar
      - PASSWORD=pulsar
      - LOG_LEVEL=DEBUG```


----
2019-11-28 12:54:00 UTC - tuteng: You seem to have added a few fewer 
environmental variables.
----
2019-11-28 12:54:47 UTC - tuteng: `docker run -it -p 9527:9527 -e 
REDIRECT_HOST=<http://192.168.0.104> -e REDIRECT_PORT=9527 -e 
DRIVER_CLASS_NAME=org.postgresql.Driver -e 
URL='jdbc:<postgresql://127.0.0.1:5432/pulsar_manager>' -e USERNAME=pulsar -e 
PASSWORD=pulsar -e LOG_LEVEL=DEBUG -v $PWD:/data 
apachepulsar/pulsar-manager:v0.1.0 /bin/sh`
----
2019-11-28 12:55:40 UTC - tuteng: You need to map a directory of data to store 
the data in the database. `-v $PWD:/data`
----
2019-11-28 13:00:48 UTC - tuteng: You may need to add a volume configuration to 
storage data in file compose.yml. 
<https://docs.docker.com/compose/compose-file/compose-file-v2/>
----
2019-11-28 13:02:42 UTC - tuteng: ```volumes:
  # Just specify a path and let the Engine create a volume
  - /var/lib/mysql

  # Specify an absolute path mapping
  - /opt/data:/var/lib/mysql

  # Path on the host, relative to the Compose file
  - ./cache:/tmp/cache

  # User-relative path
  - ~/configs:/etc/configs/:ro

  # Named volume
  - datavolume:/var/lib/mysql```
----
2019-11-28 14:14:44 UTC - Alexandre DUVAL: Hi, the storage size of a topics 
stats only contains sum size of potential backlogs, but the currently size used 
by retention settings is only displayed in topics stats-internal, right?
----
2019-11-28 14:19:03 UTC - tuteng: Are you talking about this problem 
<https://github.com/apache/pulsar/pull/5108>? This has been fixed in version 
2.4.2.
----
2019-11-28 14:23:17 UTC - Alexandre DUVAL: Completely :wink:, thanks.
----
2019-11-28 14:54:06 UTC - Brian Doran: Hi all .. we're currently benchmarking 
our Pulsar standalone in comparison with the a single Kafka broker; we're 
hitting some throughout issues and not seeing anything like thr throughout with 
pulsar and was just looking for some pointers as to how to tune for throughput?

Typically our Kafka numbers are high with about 272 Million records per 5 
minute period but we look to be constrained in some way with Pulsar to anywhere 
from about 60-90 million records per 5 minute period. (300K per sec)

We've made some changes arounds backup queue size, memory assigned to the 
instance etc but not huge movement.

Any pointers would be greatly appreciated
----
2019-11-28 14:59:33 UTC - Christophe Bornet: @Brian Doran are your Pulsar and 
Kafka configured the same way regarding flushing to disk ?
----
2019-11-28 15:05:43 UTC - Brian Doran: @Christophe Bornet We tried but didn;t 
get any uplift, maybe it was done incorrectly:
Modified Zookeeper.conf
Modified bookeeper.conf (standalone.conf already false)
bookkeeper.conf:journalSyncData=true to bookkeeper.conf:journalSyncData=false
----
2019-11-28 15:07:12 UTC - Brian Doran: we've also
producer.max.pending.messages.across.partitions 1000000
producer.max.pending.messages 5000000
Also increased
backlogQuotaDefaultLimitGB= 20GB
----
2019-11-28 15:09:27 UTC - Christophe Bornet: Are you limited by production or 
consumption ?
----
2019-11-28 15:11:01 UTC - Brian Doran: Production .. we don't have any 
consumers connected yet. Purely ingest tests at the moment
----
2019-11-28 15:17:41 UTC - Brian Doran: Are there any Pulsar Client Producer JMX 
metrics available like there is for Kafka?
----
2019-11-28 15:52:49 UTC - Paul Hodson: @Paul Hodson has joined the channel
----
2019-11-28 19:44:22 UTC - Fred Roy: @Fred Roy has joined the channel
----
2019-11-28 20:55:42 UTC - Sijie Guo: You can use Producer#getStats to get the 
producer side metrics.
----
2019-11-28 20:56:03 UTC - Sijie Guo: Can you explain how did you setup the 
benchmarks?
----
2019-11-28 23:06:54 UTC - Christophe Bornet: > I need to know the current 
consumer lag for a message published some time ago. The idea being estimating 
how much time there is before a given message for which I know the publish time 
will be consumed. Anyone knows how I could achieve this ?
Anyone has an idea ? I'm kinda blocked... In Kafka I can get the consumer 
offsets and the offset of a producer for a given timestamp. I haven't found a 
way to do the same with Pulsar.
----
2019-11-29 00:01:40 UTC - Sijie Guo: currently Pulsar doesn’t expose this 
metric directly. but it should be a pretty simple to expose. because we can 
find the *publish time* of the last message and the *publish time* for the last 
consumed message.

Can you file a github issue for it?
----
2019-11-29 05:55:40 UTC - Christophe Bornet: Sure :+1:
----
2019-11-29 08:27:15 UTC - Christophe Bornet: Is there something that represents 
a global offset for producer and consumer in the current API ? I've seen some 
`numberOfEntries` but I don't know if it can be considered as a globally 
monotonically increasing counter or if it decreases when messages are removed 
after retention period.
----
2019-11-29 08:33:46 UTC - Sijie Guo: numberOfEntries can decrease.

Technically `MessageId` is the *increasing* identifier. but since MessageId is 
comprised of multiple parts. You can use two MessageId to compute the distance  
(aka the number of messages or number of bytes) between two messages.

There was a discussion before to introduce a monotonically increasing index for 
the messages in a partitions. so you can use that to compute the number of 
messages in between two messages. Although we didn’t get time to implement that.

If the publish time lag doesn’t work for you, we can pick that task up again.
----

Slack digest for #general - 2019-11-29

Reply via email to