2018-07-03 14:07:23 UTC - Idan: hi guys ---- 2018-07-03 14:07:26 UTC - Idan: iam getting this: ---- 2018-07-03 14:07:27 UTC - Idan: Error init producer for topic MyQueue error java.lang.RuntimeException: Error creating producer Namespace missing local cluster name in clusters list: local_cluster=pulasr-pulsar-cluster ns=public/default clusters=[pulsar-pulsar] ---- 2018-07-03 14:07:44 UTC - Idan: i guess iam using diff namespace. how can I modify the java client to use diff namespace? ---- 2018-07-03 14:21:44 UTC - Idan: where do I set in my java client producer’s code ---- 2018-07-03 14:21:44 UTC - Idan: this: ---- 2018-07-03 14:21:46 UTC - Idan: <persistent://pulsar-pulsar-cluster/public/default> ---- 2018-07-03 14:21:49 UTC - Idan: not sure i get this via the api ---- 2018-07-03 14:53:27 UTC - Idan: ok found out how. had to do it on the topic input ---- 2018-07-03 15:40:43 UTC - Idan: Hi guys, we created very basic pulsar cluster. just to check sanity we send and consumed few bets (around 10 bets) we see pretty much latency statistics.. any idea? ---- 2018-07-03 15:40:49 UTC - Idan: 2018-07-03 15:37:52,162 INFO org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - [<persistent://pulsar-pulsar-cluster/public/default/Queue>] [pulasr-pulsar-cluster-0-3] Pending messages: 0 --- Publish throughput: 0.33 msg/s --- 0.00 Mbit/s --- Latency: med: 7.342 ms - 95pct: 31.354 ms - 99pct: 31.354 ms - 99.9pct: 31.354 ms - max: 31.354 ms --- Ack received rate: 0.33 ack/s --- Failed messages: 0 ---- 2018-07-03 15:40:55 UTC - Idan: 31 ms ---- 2018-07-03 15:44:49 UTC - Rasty Turek: @Rasty Turek has joined the channel ---- 2018-07-03 15:57:17 UTC - Sijie Guo: @Idan it depends on your disk characteristics. What type of disks do you have? ---- 2018-07-03 16:01:38 UTC - Idan: ill get that from my infra guy ---- 2018-07-03 16:01:48 UTC - Idan: what you need to know exactly regarding disk characteristics? ---- 2018-07-03 16:01:53 UTC - Idan: e.g ssd?? ---- 2018-07-03 16:05:05 UTC - Sijie Guo: Pulsar does fsync by default. So your latency really depends on how fast your disk can do fsync. If you have those metrics, it would be good. If you don’t, knowing what type of disks helps as well. ---- 2018-07-03 16:05:58 UTC - Daniel Ferreira Jorge: Hi, doesn't `subscriptionInitialPosition` (pull #1397) work with `topicsPattern`? (java) ---- 2018-07-03 16:09:11 UTC - Idan: @Sijie Guo ill get this data and come back with results ---- 2018-07-03 16:14:44 UTC - Idan: @Sijie Guo thats gp2 ---- 2018-07-03 16:16:21 UTC - Idan: we are using aws ssd ---- 2018-07-03 16:16:22 UTC - Idan: Volume Type General Purpose SSD (gp2)* Provisioned IOPS SSD (io1) Throughput Optimized HDD (st1) Cold HDD (sc1) ---- 2018-07-03 16:33:27 UTC - Matteo Merli: @Idan is that an EBS? ---- 2018-07-03 16:38:28 UTC - Matteo Merli: To have low latency (with fsync on) you should preferably be using a locally attached SSD (in AWS). That will give you a fsync() latency of 0.5 ms at 99pct. The only problem in AWS is that the SSDs are not very good at fsync workloads, so the latency will be a bit bumpy (when SSD is performing its own GC cycle).
You have a couple of options to improve latency: * Write more copies of data — eg: write 3 and wait for 3 acks — That will prune out the slowest storage node on each write request * Disable fsync in bookies (`journalSyncData=false` in `bookkeeper.conf`) ---- 2018-07-03 16:50:36 UTC - Idan: I cant write 3's by 3's as this will delay my overall latency per one message ---- 2018-07-03 16:51:06 UTC - Matteo Merli: sorry, that was a typo :slightly_smiling_face: ---- 2018-07-03 16:51:06 UTC - Idan: How would you describe 'bumpy'? Avg of 10ms avg is fair enough ---- 2018-07-03 16:51:12 UTC - Matteo Merli: eg: write 3 and wait for *2* acks ---- 2018-07-03 16:52:57 UTC - Matteo Merli: > How would you describe ‘bumpy’? Avg of 10ms avg is fair enough Avg is typically fine, especially at normal rate, though at sustained rate of > 100MB/s per node the SSD fsync 99pct latency will occasionaly spike up to ~100ms every ~1min and then go back to ~2ms ---- 2018-07-03 16:54:24 UTC - Idan: Thats durable i think. Ok ill perform serious load test and share statistics ---- 2018-07-03 16:54:34 UTC - Idan: Perhaps you recommend to us .other aws disk type? ---- 2018-07-03 16:55:12 UTC - Matteo Merli: If you want to keep the fsync behavior, I recommend to use VMs with local attached SSD ---- 2018-07-03 16:56:16 UTC - Idan: All our sys is on AWS ---- 2018-07-03 16:56:23 UTC - Idan: we wont be able to do that ---- 2018-07-03 16:57:10 UTC - Rasty Turek: You can however use Local SSD as a caching in front of your hdd ---- 2018-07-03 16:57:13 UTC - Matteo Merli: You have several VM types with local <http://disks.eg|disks.eg>: `i3.*` ---- 2018-07-03 16:59:43 UTC - Matteo Merli: <http://i3.xxx|i3.xxx> all have local SSDs. There are other options with local HDDs d2 / h1 / r3 ---- 2018-07-03 17:02:07 UTC - Idan: Available in AWS ---- 2018-07-03 17:08:20 UTC - Matteo Merli: yes, these are all EC2 VM types : <https://aws.amazon.com/ec2/instance-types/> check “Storage optimized” ---- 2018-07-03 17:43:56 UTC - Daniel Ferreira Jorge: I'm having a pretty hard time with a specific use case here. We use couchbase and we are tailing the couchbase replication logs and sending it to pulsar (this tailing, gives me at least once delivery). A couchbase cluster is always divided into 1024 partitions (vBuckets). Inside one partition, we have guaranteed ordering and an incrementing transaction number. To be able to put each of these transactions in pulsar exactly once, I enabled pulsar de-duplication, created one topic and one producer for each couchbase partition (each producer is publishing messages from one couchbase partition), and used the couchbase transaction number as the producer sequence id. It is working perfectly and I have 1024 topics that are mirroring each of the couchbase's 1024 partitions and the messages are being published exactly once. Now I need to consume these 1024 topics. Obviously I went with a `topicsPattern` *but*, the problem is that I was not able to create a new subscription and start consuming from the beginning using `subscriptionInitialPosition`. I also tried a list of topics using `topics()`. The `subscriptionInitialPosition` only works if I create a consumer that subscribes to only one topic. Is there a way to achieve that? ---- 2018-07-03 17:44:11 UTC - Idan: @Matteo Merli thanks ill talk a look and come back with responses ---- 2018-07-03 18:01:32 UTC - Matteo Merli: @Daniel Ferreira Jorge I think the `subscriptionInitialPosition` should ideally work even with topicsPattern, since it only allows to specify either Earliest or Latest and not a specific message id. If that’s not the case, we should fix it ---- 2018-07-03 18:03:32 UTC - Matteo Merli: As a workaround, you could subscribe to all topics individually, with a single message listener. That would be almost the same behavior of the regex/multi-topics subscribe ---- 2018-07-03 18:05:02 UTC - Daniel Ferreira Jorge: @Matteo Merli I believe this is the case. I made many tests here and `topicsPattern()` does not work, while `topic()` works as expected. ---- 2018-07-03 18:05:25 UTC - Matteo Merli: Update, we just saw the problem is that the config for `subscriptionInitialPosition` is not propagated correctly on the multi-topics config. @Sijie Guo is opening an issue ---- 2018-07-03 18:05:42 UTC - Matteo Merli: we should have a fix quickly ---- 2018-07-03 18:06:18 UTC - Daniel Ferreira Jorge: ahhh that is great! ---- 2018-07-03 18:07:17 UTC - Daniel Ferreira Jorge: for now, do you have a quick example on how to subscribe to all topics individually? ---- 2018-07-03 18:08:09 UTC - Matteo Merli: as you mentioned, you get the list of topics, either static of using the API and create a new consumer for each of them ---- 2018-07-03 18:08:54 UTC - Matteo Merli: on the `ConsumerBuilder`, specify the `messageListener()` to receive call for messages from any topic ---- 2018-07-03 18:09:49 UTC - Daniel Ferreira Jorge: My consumer initialization is this `pulsarClient.newConsumer().consumerName("vbucket-consumer").topicsPattern(pattern).subscriptionName("vbucket-to-objects5").subscriptionType(SubscriptionType.Shared).subscriptionInitialPosition(SubscriptionInitialPosition.Earliest).messageListener(new MQListener()).receiverQueueSize(1).subscribe();` ---- 2018-07-03 18:10:11 UTC - Sijie Guo: fyi - <https://github.com/apache/incubator-pulsar/issues/2077> this is the issue for tracking the problem. we will try to include this as part of 2.1 release. ---- 2018-07-03 18:10:13 UTC - Daniel Ferreira Jorge: But if I create 1024 consumers, it is too expensive ---- 2018-07-03 18:10:58 UTC - Matteo Merli: You can share the same `new MQListener()` instance ---- 2018-07-03 18:11:24 UTC - Daniel Ferreira Jorge: @Sijie Guo ah... that is great! ---- 2018-07-03 18:11:45 UTC - Daniel Ferreira Jorge: @Matteo Merli I will try that! ---- 2018-07-03 18:11:52 UTC - Daniel Ferreira Jorge: Thanks guys ---- 2018-07-03 18:14:55 UTC - Daniel Ferreira Jorge: @Matteo Merli The workaround worked great. Thanks! ---- 2018-07-03 18:15:11 UTC - Matteo Merli: :+1: ---- 2018-07-03 21:16:31 UTC - Grant Wu: @Grant Wu has joined the channel ---- 2018-07-03 21:25:36 UTC - Grant Wu: I’m having trouble using the website ---- 2018-07-03 21:26:10 UTC - Grant Wu: @Grant Wu uploaded a file: <https://apache-pulsar.slack.com/files/UBHR9CH5E/FBJLQS2N9/screen_shot_2018-07-03_at_17.24.26.png|Screen Shot 2018-07-03 at 17.24.26.png> and commented: The page doesn’t scroll down to allow me to view all the links in the accordion; and the footer blocks things ---- 2018-07-03 21:35:16 UTC - Sijie Guo: @Grant Wu I think there is some problems about overlays at the sidebar. we are aware of the problem and actually there is one people working on improving the website in general. for the specific issue here, a temp get-around solution is to zoom-in the webpage (on mac/chrome, it is command and ‘-’), so the sidebar can fit in your screen. I know it is a bit inconvenient :disappointed: ---- 2018-07-03 21:35:52 UTC - Grant Wu: Thanks, just making sure you were aware :slightly_smiling_face: ok_hand : Sijie Guo ---- 2018-07-03 21:37:05 UTC - Grant Wu: I just started my first fulltime job yesterday, and I’m going to be working with Pulsar for part of what I’m doing I was wondering if anyone had any suggestions for something I’ve been asked to implement ---- 2018-07-03 21:37:26 UTC - Grant Wu: We want to be able to get timestamp ranges of messages from Pulsar, i.e. all messages for a topic sent between two timestamps ---- 2018-07-03 21:38:04 UTC - Grant Wu: Is there anything better than storing a subsampled timestamp to message ID approximate mapping for this? ---- 2018-07-03 21:38:52 UTC - Grant Wu: Also apologies in advance if I’m misunderstood anything about Pulsar ---- 2018-07-03 21:40:05 UTC - Sijie Guo: @Grant Wu is the timestamp your application’s timestamp? or publish timestamp of a message? ---- 2018-07-03 21:41:46 UTC - Grant Wu: The latter ---- 2018-07-03 21:42:59 UTC - Matteo Merli: You could use <http://pulsar.apache.org/api/admin/org/apache/pulsar/client/admin/Topics.html#resetCursor-java.lang.String-java.lang.String-long-> to position a consumer on a particular timestamp of messages ---- 2018-07-03 21:43:13 UTC - Grant Wu: Hrm, interesting ---- 2018-07-03 21:43:20 UTC - Grant Wu: I don’t think we’re using a Java client ---- 2018-07-03 21:43:24 UTC - Grant Wu: Let me look through the other libraries… ---- 2018-07-03 21:43:30 UTC - Matteo Merli: and then scan until you get messages after your upper bound ---- 2018-07-03 21:44:03 UTC - Matteo Merli: reset cursor is part of Admin API, you can access it through REST, Java and CLI ---- 2018-07-03 21:48:19 UTC - Grant Wu: Hrm… my supervisor said there were concerns about that affecting more clients than we want, but I’m not sure how this interacts with the at most once delivery of messages ---- 2018-07-03 21:48:30 UTC - Grant Wu: Uh I’ll ask him… ---- 2018-07-03 22:01:12 UTC - Grant Wu: Uh, just to clarify something - ---- 2018-07-03 22:01:21 UTC - Grant Wu: Can there be more than one subscription to a particular topic? ---- 2018-07-03 22:01:36 UTC - Matteo Merli: yes ---- 2018-07-03 22:06:53 UTC - Grant Wu: Hrm and as I understand it this is on a per subscription basis? ---- 2018-07-03 22:07:18 UTC - Grant Wu: are there any before/after guarantees for the timestamp? ---- 2018-07-03 22:07:42 UTC - Grant Wu: or is it just the message with the lowest absolute difference timestamp ---- 2018-07-03 22:08:16 UTC - Matteo Merli: subscription will get positioned on the message with timestamp <= to the specified parameter ---- 2018-07-03 22:08:58 UTC - Grant Wu: ah okay. might be useful to clarify that in the docs, it doesn’t seem to explicitly state that ----