Slack digest for #general - 2019-01-08

Apache Pulsar Slack Tue, 08 Jan 2019 01:11:13 -0800

2019-01-07 09:11:04 UTC - Sijie Guo: Or check your bookie nodes to see if the 
bookies are running or not 
----
2019-01-07 09:20:55 UTC - bossbaby: 
<https://gist.github.com/tuan6956/cf05fc21fa733b6ef92ce86923b56dde>
----
2019-01-07 09:21:12 UTC - bossbaby: please check help me
----
2019-01-07 09:28:49 UTC - Ali Ahmed: you only have one bookie it seems
----
2019-01-07 09:30:00 UTC - bossbaby: i found error and edit it,
Run successfull
+1 : bossbaby
----
2019-01-07 09:30:06 UTC - Ali Ahmed: ok
+1 : bossbaby
----
2019-01-07 09:31:39 UTC - bossbaby: "If you deploy Pulsar in a one-node 
cluster, you should update the replication settings in conf/broker.conf to 1" 
has been described in document. But default is 2, so fix it and run againt
----
2019-01-07 09:31:44 UTC - bossbaby: Thanks you all bro
----
2019-01-07 09:32:19 UTC - Ali Ahmed: if you need a one node cluster just use 
standalone mode
----
2019-01-07 09:32:28 UTC - Ali Ahmed: it will setup everything correctly
+1 : bossbaby
----
2019-01-07 09:55:00 UTC - Yuvaraj Loganathan: Right now we are thinking of one 
topic per customer under an namespace. like topic name as 
`customer-data-&lt;customer-id&gt;` . The consumer will consume using pattern 
subscription `customer-data-*`  Let say there are two topics matches the 
subscription `customer-data-1` and `cusotmer-data-2` For every message i call 
an external service. The external service may throttle lets say for 
`customer-data-1`. So When the external service is throttles i would like to 
stop consuming the  message from `customer-data-1` for some time and continue 
on `customer-data-2` topic data which is not throttles. In pulsar client  if I 
don’t acknowledge for an topic`customer-data-1` topic and continuously 
acknowledge for `customer-data-2` topic will I get data for customer-data-2 
topic without getting blocked ?
----
2019-01-07 09:57:49 UTC - bossbaby: i don't know why do need 3 zookkeeper 
single in 3 vms in tutorial "To run Pulsar on bare metal". i think 1 zk + 1 bk 
+ 1 broker in 1 vms to enough
----
2019-01-07 11:35:36 UTC - Yuvaraj Loganathan: @Sijie Guo 
<https://github.com/apache/pulsar/issues/3317> Any help would be highly 
appreciated. Our dev pipeline blocked because of this. Not able to compile also.
----
2019-01-07 12:28:52 UTC - Yifan: @Yuvaraj Loganathan which version of python 
are you using. 3.6 doesn’t work
----
2019-01-07 12:35:10 UTC - Yuvaraj Loganathan: @Yifan Yes it is 3.6 :face_palm: 
Let me check with 3.7
----
2019-01-07 12:38:26 UTC - Yuvaraj Loganathan: It works awesome! Thanks @Yifan
+1 : Yifan, Sijie Guo
----
2019-01-07 12:39:46 UTC - Yuvaraj Loganathan: Closed the issue.
----
2019-01-07 15:53:15 UTC - Grant Wu: @Sijie Guo B u m p.
----
2019-01-07 15:53:38 UTC - Grant Wu: Wait, why doesn’t the client work with 
Python 3.6?
----
2019-01-07 15:57:04 UTC - Grant Wu: Because Zookeeper is designed to run in a 
cluster/multi-server setup to provide a voting quorum.  See 
<https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup>
----
2019-01-07 15:57:21 UTC - Grant Wu: If you want to run a standalone setup for 
development purposes, `pulsar standalone` probably suffices for your need?
+1 : Matteo Merli
----
2019-01-07 16:09:35 UTC - Matteo Merli: For MacOS we only publish the wheel 
files for 2.7 and 3.7.  These are the versions of python coming with either 
macOS or homebrew 
----
2019-01-07 16:10:33 UTC - Matteo Merli: It would be nice to have an environment 
with all the combinations, to use when doing releases 
----
2019-01-07 16:12:22 UTC - Grant Wu: Oh, so this doesn’t apply to Linux, okay
----
2019-01-07 16:26:31 UTC - Matteo Merli: Yes, in Linux we build in Docker 
containers and have all combinations 
----
2019-01-07 16:30:37 UTC - Grant Wu: or @Matteo Merli do you think you could 
look into this? :confused:
----
2019-01-07 16:37:14 UTC - Matteo Merli: Passing buck to @Jerry Peng ;)
----
2019-01-07 16:44:53 UTC - Romain Castagnet: Hi. When I activate SSL connexion 
on brokers, I have this warning before an SSL handshake error : 
"org.apache.pulsar.broker.service.ServerCnx - [/XX.XX.XX.XX:41818] Got 
exception TooLongFrameException : Adjusted frame length exceeds 5242880: 
369295620 - discarded". Yesterday morning this error disappear and it seems to 
fall in work. Since I tried to activate authentication, this error appear 
again. I don't understand why. Did you have a similar problem ?
----
2019-01-07 16:53:08 UTC - Chris Miller: Is there any reason why 
ConsumerImpl.hasMessageAvailable() is not part of the Consumer interface?
----
2019-01-07 16:56:20 UTC - Matteo Merli: Not technical reason, it’s more of 
semantics. Consumer is the api to use a managed subscription, where the server 
knows and controls which messages you’re consuming. In general application 
don’t have the need to know when they’re caught up with the publishers
----
2019-01-07 16:57:50 UTC - Matteo Merli: On the contrary, the Reader is 
completely unmanaged. A common use case is to create a reader to do a scan on 
the topic, starting from a given point and up to “now”
----
2019-01-07 16:58:08 UTC - Chris Miller: I see, thanks. Maybe I'm looking at 
things the wrong way then. I'd like to have a consumer that can read some 
history up until the most recent message. Sounds like I need a Reader instead
----
2019-01-07 16:59:43 UTC - Chris Miller: I asked some related questions on 
Friday about this, wondering when you might use Consumer.seek() vs Reader, and 
why Reader wasn't a super-interface of Consumer
----
2019-01-07 17:00:59 UTC - Chris Miller: I don't suppose there's a "best 
practices" doc somewhere detailing these sort of common patterns?
----
2019-01-07 17:02:38 UTC - Chris Miller: One thing that's missing from both 
Consumer and Reader is seeking to a timestamp. The admin API has that via 
resetCursor(). I guess it's not an efficient operation and therefore no so 
suitable for client use?
----
2019-01-07 17:06:30 UTC - Grant Wu: I’ve actually asked about this before and I 
think it was stated that it was a reasonable thing to ask for
+1 : Chris Miller
----
2019-01-07 17:14:48 UTC - Grant Wu: I think it may have been lost due to the 
history limit :disappointed:
----
2019-01-07 17:16:54 UTC - Chris Miller: History limit?
----
2019-01-07 17:20:22 UTC - Grant Wu: Yes, Slack limits free plans to 10k messages
----
2019-01-07 17:25:46 UTC - Grant Wu: There are archives sent to the mailing list 
but I don't know how to search that
----
2019-01-07 17:50:06 UTC - Chris Miller: Oh, haha sorry I thought you were 
referring to some sort of history limit in Pulsar :slightly_smiling_face:
----
2019-01-07 18:02:22 UTC - Evan Nelson: @Evan Nelson has joined the channel
----
2019-01-07 18:58:30 UTC - Jerry Peng: @Grant Wu ok let me investigate
pray : Grant Wu
----
2019-01-07 21:57:57 UTC - Jerry Peng: @Grant Wu before we can get the topic 
name in python functions we need to complete this first:
<https://github.com/apache/pulsar/issues/3322>
since there is currently no way to get the topic name from a message using the 
C++/Python API
----
2019-01-07 21:59:47 UTC - Grant Wu: oh no :disappointed:
----
2019-01-07 21:59:52 UTC - Grant Wu: Okay, good to know
----
2019-01-07 22:23:32 UTC - Jerry Peng: Though this should be pretty easy to add
----
2019-01-07 23:01:47 UTC - Emma Pollum: What IP does pulsar use for geo 
replication? Does it utilize the service url of the cluster to replicate to, or 
something else?
----
2019-01-07 23:12:23 UTC - Matteo Merli: It will use the ServiceURL for the 
other cluster as specified in the “clusters” metadata
----
2019-01-07 23:12:50 UTC - Matteo Merli: eg. the metadata you specify with 
`initialize-cluster-metadata` command
----
2019-01-07 23:14:07 UTC - Emma Pollum: Thank you!
----
2019-01-08 02:22:59 UTC - Kevin DiVincenzo: @Kevin DiVincenzo has joined the 
channel
----
2019-01-08 02:27:27 UTC - Kevin DiVincenzo: Hi - I have a question regarding 
namespaces/topics.


For my use-case, I want to create topics like: 
`persistent://{tenant}/{namespace}/{topic}/someId`. There is no problem 
creating these topics from the Java client, using `pulsar-perf produce ...`, 
etc.

The problem is that all of the admin functionality doesn't seem to work when 
you nest the topics one layer deeper than just 
`persistent://{tenant}/{namespace}/{topic}`. `{namespace}/{topic}` doesn't 
appear to be a valid namespace (expected), but if I do `pulsar-admin topics 
list {tenant}/{namespace}`, I get back an empty list.
----
2019-01-08 02:28:14 UTC - Kevin DiVincenzo: For what its worth - this use case 
is for event-sourcing / integrating with Akka Persistence.
----
2019-01-08 02:30:03 UTC - bossbaby: so, pulsar will use 1 in 3 to backup it or 
use all.
----
2019-01-08 02:33:18 UTC - Kevin DiVincenzo: So is this not possible / not 
supported?
----
2019-01-08 02:37:05 UTC - Sijie Guo: @Kevin DiVincenzo I think there something 
related to encoding and decoding “/”.
----
2019-01-08 02:37:30 UTC - Sijie Guo: I would recommend if possible trying to 
avoid using “/” for now
----
2019-01-08 02:37:37 UTC - Sijie Guo: but this is a bug we definitely to fix
----
2019-01-08 02:37:38 UTC - bossbaby: i was deploy 3 node in 1 cluster but i have 
a question that 2 in 3 node will node backup and store copy data to handle 
failure, it right?
----
2019-01-08 02:38:23 UTC - Kevin DiVincenzo: @Sijie Guo Ahh - so with the Admin 
API not being able to encode/decode `/` properly in the namespace name?
----
2019-01-08 02:38:41 UTC - bossbaby: i dont know, i should deploy 1 cluster 3 
broker or 3 cluster to 2 cluster add in 1 cluster
----
2019-01-08 02:42:56 UTC - Sijie Guo: @Kevin DiVincenzo:

&gt; so with the Admin API not being able to encode/decode `/` properly in the 
namespace name?

it should already encode and decode “/”. however “/” is used for distinguish 
namespace, tenant and topic, as well as for distinguish v1 topic format and v2 
topic format. so there might something in rest server doesn’t handle encoding 
properly. (feel free to create a github issue for that)

so I strongly recommend to avoid using “/” in topic name for now, until we 
identified the issue and fix it properly
----
2019-01-08 02:44:56 UTC - Kevin DiVincenzo: @Sijie Guo - Before I go down the 
path of using some other delimiter (e.g. `-`), is it safe to assume that there 
currently isn't a better way to represent this _{eventlog_name}_ *{delimiter}* 
_{actual pulsar topic}_  relationship within pulsar currently?
----
2019-01-08 02:45:45 UTC - Sijie Guo: can you use `{eventlog_name}` as a 
namespace?
----
2019-01-08 02:45:48 UTC - Kevin DiVincenzo: I'm planning on using the 
multi-topic subscription to aggregate all of the child topics into the event 
log FWIW
----
2019-01-08 02:47:12 UTC - Kevin DiVincenzo: Well each entity in the aggregate 
root (e.g. event-log) has its own _persistenceId_ (artifact of the Akka 
Persistence system). Each entity needs to be able to traverse the topic (by 
sequenceId) for various purposes.
----
2019-01-08 02:47:46 UTC - Kevin DiVincenzo: So you might have 5 assets in some 
event-log called "asset", each with their own unique persistenceId
----
2019-01-08 02:49:39 UTC - Kevin DiVincenzo: If some other service wanted to 
read the whole log of assets (vs. a single asset), I was just using the 
`.topicsPattern(...)` method on the client with 
`<persistent://tenant/namespace/assets/.*>` as the pattern.
----
2019-01-08 02:49:58 UTC - Kevin DiVincenzo: All of this is actually already 
working in my little demo (before I build it out to a proper SDK).
----
2019-01-08 02:50:28 UTC - Kevin DiVincenzo: It was just the namespace 
navigation / admin topics list stuff that had me stumped.
----
2019-01-08 02:52:20 UTC - Kevin DiVincenzo: @Sijie Guo - I guess before I go 
further down the rabbit hole with this, are there any current limitations for 
the `MultiTopicsConsumerImpl`?
----
2019-01-08 02:52:30 UTC - Kevin DiVincenzo: E.g. problems with reading from 
thousands of topics?
----
2019-01-08 02:54:36 UTC - Kevin DiVincenzo: You have the `property` field on 
the message builder, so I was also thinking of tagging messages with their 
`persistenceId` - the downside is to see the history for just a single 
`persistenceId`, you'd have to traverse the topic (e.g. with the Reader 
interface), filter only those messages for that `persistenceId`, then create 
some sort of mapping into a logical sequenceId for only those messages.
----
2019-01-08 02:54:59 UTC - Kevin DiVincenzo: Straight forward to do I guess, but 
I was trying to avoid it if possible / not necessary.
----
2019-01-08 02:55:04 UTC - Sijie Guo: ah:

1) I would suggest you use other delimiters, such as “-” or “_”. so in your use 
case, your regex will be “<persistent://tenant/namespace/asserts_.*>“.
2)
&gt; problems with reading from thousands of topics?

there shouldn’t be problems reading from thousands of topics. but the number of 
topics will be bounded by your resources of your client machine, such as memory.
----
2019-01-08 02:55:38 UTC - Sijie Guo: it depends on your use case
----
2019-01-08 02:56:02 UTC - Kevin DiVincenzo: ^^ - perfect thanks. I'm assuming 
that bounding the client receive buffer to something ~reasonable~ small like 
`10` should fix #2?
----
2019-01-08 02:57:11 UTC - bossbaby: i was deploy 3 node in 1 cluster but i have 
a question that 2 in 3 node will node backup and store copy data to handle 
failure, it right?
----
2019-01-08 02:58:36 UTC - Kevin DiVincenzo: IOW - is that receive buffer per 
individual topic (assuming yes based on your response) or is it shared between 
all topics?
----
2019-01-08 02:58:38 UTC - bossbaby: in my node is 1 bookkeeper( 3 bookkeeper - 
3 cluster)
----
2019-01-08 03:07:12 UTC - Kevin DiVincenzo: Actually never mind - everything 
seems to be working fine, up to 1,000 topics. I guess if the number of topics 
in an event-log ever needs to exceed that number, we'll just use multiple 
readers.
----
2019-01-08 03:07:17 UTC - Kevin DiVincenzo: Thanks for your help @Sijie Guo
ok_hand : Sijie Guo
----
2019-01-08 03:07:50 UTC - Sijie Guo: yes. it is per topic.
----
2019-01-08 03:08:09 UTC - Sijie Guo: I think there is a setting for the total 
receiver buffer as well
----
2019-01-08 03:08:57 UTC - Kevin DiVincenzo: Actually - one more question for 
sanity's sake if you don't mind.
----
2019-01-08 03:09:10 UTC - Chris Chapman: @Chris Chapman has joined the channel
----
2019-01-08 03:11:23 UTC - Kevin DiVincenzo: From testing, it seems like with 
the default message retention policy and backlog policy, messages are actually 
*not ever* deleted from the topic. I'm able to later on start a consumer (with 
`.subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)`) and read 
the entire history of all messages sent to this topic (this is what I want). Is 
this the actual/correct behavior?
----
2019-01-08 03:11:39 UTC - Sijie Guo: &gt;  2 in 3 node will node backup and 
store copy data to handle failure, it right?

bookkeeper has replication to handle failures
----
2019-01-08 03:13:40 UTC - Sijie Guo: &gt;  it seems like with the default 
message retention policy and backlog policy

I think default message retention policy is deleting the messages after all 
subscriptions have consumed the messages.
if the messages are not deleted, there might be some subscriptions not 
acknowledging the messages. I would recommend u using “pulsar-admin topics 
stats &lt;topic&gt;” to see if there are any subscriptions in that topic not 
acknowledging the messages.
----
2019-01-08 03:14:21 UTC - Kevin DiVincenzo: Is there a way to tell pulsar 
"don't delete messages ever" then?
----
2019-01-08 03:15:11 UTC - Kevin DiVincenzo: (was planning on using the awesome 
Bookkeeper tiered storage feature)
----
2019-01-08 03:15:21 UTC - bossbaby: so 2 bookeeper in 2 Vms and it will 
replicaion ?
----
2019-01-08 03:15:46 UTC - Sijie Guo: yes
----
2019-01-08 03:18:06 UTC - Sijie Guo: @Kevin DiVincenzo - currently you can 
configure the retention policy (by setting retention time to -1) to keep the 
data forever.  it is per namespace basis. 
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/>
----
2019-01-08 03:19:56 UTC - Kevin DiVincenzo: i.e. `pulsar-admin namespaces 
set-retention my-tenant/my-ns \
  --size -1 \
  --time -1`
+1 : Sijie Guo
----
2019-01-08 03:20:28 UTC - Sijie Guo: &gt; messages are actually *not ever* 
deleted from the topic.

actually I took my previous comment back. it is probably related to how pulsar 
garbage collects data. pulsar garbage collects data by segments. so there is at 
least one segment kept even all your consumers are consumed the messages. that 
might explain why you receive all the data after restarting from `earliest`.

anyway, in general, you can use “topics stats” and “topics stats-internal” to 
see more details about the topic
----
2019-01-08 03:20:34 UTC - Sijie Guo: yes
----
2019-01-08 03:20:49 UTC - Kevin DiVincenzo: Yup - that's want I needed. Thanks 
again.
----
2019-01-08 03:22:40 UTC - Sijie Guo: cool
----
2019-01-08 03:27:21 UTC - bossbaby: great, i thought i will setup to 
replication but now, setup every broker with every node and bookkeeper will 
replcate data to every bookkeeper have in cluster
----

Slack digest for #general - 2019-01-08

Reply via email to