Slack digest for #general - 2019-03-31

Apache Pulsar Slack Sun, 31 Mar 2019 02:12:09 -0700

2019-03-30 12:53:42 UTC - Ben S: I have an eventually strange use case for 
pulsar (actually I have several, but this one is strange), and I hope you can 
help me with the configuration. I need to deliver current safety information to 
possibly slow consumers (connected via wifi and possibly moving around). I 
thought a bit about the actual problem and I think what I want is the slow 
clients to drop all non-acknowledged old messages and only consume the most 
recent one. Thus what I want is message delivery guaranteed for the most recent 
message, though if there is another message inflight the client should drop all 
old messages. Is it possible to configure the pulsar client and server for this 
scenario?
----
2019-03-30 16:12:58 UTC - David Kjerrumgaard: Hi @Vikas Can you share your logs 
with me?
----
2019-03-30 18:33:51 UTC - Matteo Merli: Would the TTL expiration address this 
problem?
----
2019-03-30 18:37:29 UTC - Ben L: @Ben L has joined the channel
----
2019-03-30 19:08:31 UTC - Ben L: Hey all, I'm considering using Pulsar for a 
possibly abnormal use case and have a few questions. I'd like to create a 
stream of binary chunks of data (maximum 4MiB in size) that is geo-replicated. 
The chunks will be consumed into a specialized storage and indexing layer. I 
don't require any processing of the chunks in the streaming system as they're 
opaque to this part of the system, but I do need a substantial throughput of 
roughly 400MiB/s on average across approximately 10 million topics. The 
producers are occasionally very bursty and I want pulsar to act as a very large 
buffer for this. If this sounds absolutely insane you can stop reading here and 
let me know. Now onto some more concrete questions:
- How does pulsar behave with messages of this size? Are there any particular 
tunables I should be aware of? I know there is a 5MiB cap on message size by 
default.
- Is it possible to create a shared subscription where all subscribers receive 
every message? This is close to the shared subscription mode but not quite the 
same, I'm not sure how this conflicts with the exactly once guarantees etc.
- What gets slower as I add more topics and how many topics can I expect to 
create i.e. 1,10,100 million?
- Is it possible for the producer to made aware of how big the queue is getting 
and slow itself down accordingly? (I roughly control the production rate in the 
system)
----
2019-03-30 22:44:44 UTC - Zarrar: Hi, I am trying to run the functions using 
KubernetesContainerFactory, but when I enable authentication
the pods fails to start with the following error. The token passed to the pod 
is for a
a super user defined in the broker.

```
HTTP 401 Authentication required

Reason: HTTP 401 Authentication required

```

Pod command

```
command:
- sh
- -c
- /pulsar/bin/pulsar-admin --admin-url <http://pulsar-broker:8080>
functions download
--path
tenant-test/ns1/reverse2/1fdda06d-8c0b-4781-b7e8-082aeec81f4d-reverse.py
--destination-file /pulsar/reverse.py &amp;&amp; SHARD_ID=${POD_NAME##*-}
&amp;&amp; echo shardId=${SHARD_ID}
&amp;&amp; PYTHONPATH=${PYTHONPATH}:/pulsar/instances/deps python
/pulsar/instances/python-instance/python_instance_main.py
--py /pulsar/reverse.py --logging_directory logs/functions --logging_file
reverse2
--logging_config_file
/pulsar/conf/functions-logging/console_logging_config.ini
--install_usercode_dependencies True --instance_id $SHARD_ID
--function_id f4cb9572-3244-4c1d-99cf-d7da18976653
--function_version 70fab446-e49b-4af6-851f-7556e3b3d1ff
--function_details
'{"tenant":"tenant-test","namespace":"ns1","name":"reverse2","className":"reverse","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"<persistent://tenant-test/ns1/backwards>":{}},"cleanupSubscription":true},"sink":{"topic":"<persistent://tenant-test/ns1/forwards>"}}'
--pulsar_serviceurl <pulsar://pulsar-broker:6650> --client_auth_plugin
org.apache.pulsar.client.impl.auth.AuthenticationToken
--client_auth_params token:redacted
--use_tls false --tls_allow_insecure false
--hostname_verification_enabled false
--max_buffered_tuples 1024 --port 9093 --metrics_port 9094
--expected_healthcheck_interval
-1 --secrets_provider secretsprovider.ClearTextSecretsProvider
--cluster_name
pulsar

```
----
2019-03-30 22:48:09 UTC - Matteo Merli: @Ben L

&gt; - How does pulsar behave with messages of this size? Are there any
particular tunables I should be aware of? I know there is a 5MiB cap on message
size by default.

Having big messages leads to a higher throughput in general. There are few
options that would make sense to tune and these are typically on the queue
sizes (for producer, consumers and replication). These options let you control
the max amount of memory in each component.

&gt; - Is it possible to create a shared subscription where all subscribers
receive every message? This is close to the shared subscription mode but not
quite the same, I’m not sure how this conflicts with the exactly once
guarantees etc.

You can just create multiple subscriptions (each with a different name)

&gt; - What gets slower as I add more topics and how many topics can I expect
to create i.e. 1,10,100 million?

There are Pulsar deployments with 3M topics and geo-replication. I think it
would be possible to get to 10M, though with current implementation 100M is way
out of reach.

The main concern comes from the amount of metadata to be stored. Also from the
amount of memory involved to keep track of all the topics/consumers.

In general, most of the times it’s easier to adjust the design and deal with
some degree of multiplexing, rather than tackling the harder problems.

&gt; - Is it possible for the producer to made aware of how big the queue is
getting and slow itself down accordingly? (I roughly control the production
rate in the system)

Kind of, it’s a big black or white control. There is a backlog quota
enforcement. Once the quota is reached on a topic, the producer will get
blocked until the storage goes below quota.
----
2019-03-30 23:31:49 UTC - Vikas: hey @David Kjerrumgaard , we are able to
troubleshoot the issue. Its working fine now. thank you.
----
2019-03-30 23:43:18 UTC - David Kjerrumgaard: @Vikas thats great news. Thanks
for the update
----
2019-03-31 00:07:12 UTC - Jerry Peng: @Zarrar Currently function authentication
is not very user friendly and is not really support in kubernetes. A more
generic and user friendly implementation of authentication and authorization
was just added by me i.e.
<https://github.com/apache/pulsar/pull/3735>
<https://github.com/apache/pulsar/pull/3874>
they will officially be released for 2.4 but feel free to compile master to try
it out yourself. Function auth should work for process and threaded mode but
for kubernetes a piece is missing in 2.3. The reason why its failing is
because the initial download command i.e.
```
/pulsar/bin/pulsar-admin --admin-url <http://pulsar-broker:8080> functions
download
--path
tenant-zarrar/ns1/reverse2/1fdda06d-8c0b-4781-b7e8-082aeec81f4d-reverse.py
--destination-file /pulsar/reverse.py
```

doesn’t have the auth params passed. It needs to be:
```
/pulsar/bin/pulsar-admin --auth-plugin
org.apache.pulsar.client.impl.auth.AuthenticationToken --auth-params
token:&lt;token&gt; --admin-url <http://pulsar-broker:8080> functions download
--path
tenant-zarrar/ns1/reverse2/1fdda06d-8c0b-4781-b7e8-082aeec81f4d-reverse.py
--destination-file /pulsar/reverse.py
```

If you add that manually by editing the statefulset it will probably work. As
I said before authentication/authorization will be more polished in 2.4
----
2019-03-31 02:11:44 UTC - xutao: @xutao has joined the channel
----
2019-03-31 04:24:05 UTC - Joe Francis: Non-persistent messages will give you
any current messages (not most recent)
----

Slack digest for #general - 2019-03-31

Reply via email to