2019-08-06 10:20:52 UTC - Richard Sherman: This was when creating a new
Consumer via a pulsar-proxy. Restarting the proxy cleared the problem.
----
2019-08-06 13:30:39 UTC - Chris Bartholomew: @Addison Higham I built a Docker
image for the broker that is 2.3.1 plus that PR. I have been using it for a
while now. If you are using k8s, it would be easy to try it to see if it fixes
your Datadog issues: ```broker:
repository: kafkaesqueio/pulsar-all
pullPolicy: IfNotPresent
tag: 2.3.1_kesque_1
```
----
2019-08-06 13:36:54 UTC - Alexandre DUVAL: Sure, but it is not implemented in
the pulsar-admin functions command, right?
----
2019-08-06 18:15:56 UTC - Zhenhao Li: hi there, has anyone used Pulsar SQL?
----
2019-08-06 18:16:44 UTC - Jerry Peng: Yes
----
2019-08-06 18:16:50 UTC - Zhenhao Li: I read this page
<https://pulsar.apache.org/docs/en/sql-overview/> and have a few questions.
are Presto workers part of Pulsar?
----
2019-08-06 18:17:26 UTC - Zhenhao Li: and do you have performance benchmarking
results?
----
2019-08-06 18:18:17 UTC - Jerry Peng: @Zhenhao Li Presto is packaged with
pulsar for ease of use thus you can use the ./bin/pulsar to start a presto
cluster. Alternatively, you can also just deploy your own presto cluster using
a presto distribution
----
2019-08-06 18:19:11 UTC - Zhenhao Li: I see. Is it possible to use Spark to
query data in Pulsar via a Spark connector?
----
2019-08-06 18:19:38 UTC - Zhenhao Li: if so what is the difference from Presto?
----
2019-08-06 18:20:59 UTC - Jerry Peng: The Presto connector reads data directly
from the storage layer to maximize throughput, while the spark connector, I
believe reads data with consumer/reader API which is optimized for latency.
----
2019-08-06 18:22:01 UTC - Jerry Peng: While we can do a similar implementation
for the spark connector to read directly from the storage layer. We have yet
to do that.
----
2019-08-06 18:22:25 UTC - Zhenhao Li: ok. thanks!
----
2019-08-06 18:22:55 UTC - Zhenhao Li: I actually have a real use case in mind.
how fast can it be to replay all historical events for a given key?
----
2019-08-06 18:23:29 UTC - Zhenhao Li: I am thinking of using Pulsar as the
persistence layer for event sourcing
----
2019-08-06 18:25:17 UTC - Zhenhao Li: in Kafka there will be no bound because
each partition can have growing number of keys
----
2019-08-06 18:25:43 UTC - Jerry Peng: Because Pulsar SQL (presto connector)
reads data directly from the storage layer, it can read the data within topics
in parallel from multiple replicas. The more replicas you have of the data,
the higher the potential read throughput
----
2019-08-06 18:25:56 UTC - Zhenhao Li: I wonder if Pulsar can provide a time
bound on queries per key
----
2019-08-06 18:27:29 UTC - Jerry Peng: A hard time bound cannot be guaranteed
but throughput will be much higher than query kafka using presto since data is
read out from a consumer and limited by number of partitions in that case
----
2019-08-06 18:28:40 UTC - Zhenhao Li: I see
----
2019-08-06 18:29:50 UTC - Jerry Peng: Messages of a key will only be in a
particular partition so if you know which partition the messages for a key
resides, you can just query for that partition
----
2019-08-06 18:30:03 UTC - Jerry Peng: to minimize the amount of data you will
need to filter
----
2019-08-06 18:30:46 UTC - Jerry Peng: There is not HARD time bound but the soft
time bound is the query time is proportional to the number of messages in the
topic
----
2019-08-06 18:31:39 UTC - Jerry Peng: Pulsar, like any big data system, is not
a hard real-time system and thus does not have hard guarantee on completion time
----
2019-08-06 18:31:42 UTC - Zhenhao Li: it makes sense to use a hashing function
to map keys to partitions. so both producers and query makers know the map
----
2019-08-06 18:33:46 UTC - Jerry Peng: Its also interesting to note that read
throughput is not bounded on the number of partitions in Pulsar SQL. Users can
simply increase the number of write replicas
----
2019-08-06 18:33:50 UTC - Zhenhao Li: but it would great if there are some
benchmarking results, say against HBASE, Cassandra, etc
----
2019-08-06 18:35:00 UTC - Jerry Peng: Though Pulsar and Pulsar SQL has
different use cases as compared to HBASE, Cassandra, which are nosql kv stores
----
2019-08-06 18:35:08 UTC - Zhenhao Li: sorry, do you mean read replicas?
----
2019-08-06 18:35:33 UTC - Jerry Peng: Pulsar SQL also works with tiered storage
and is able to query data offload to object stores like S3
----
2019-08-06 18:40:15 UTC - Jerry Peng: In pulsar terminology, when message is
published there is a write quorum and and ack quorum. For example if my
cluster is setup with a write quorum of 3 and an ack quorum of 2, this means
for every message written to the storage layer nodes, 2 out of 3 acks needs to
be received from the storage layer nodes before the message is deemed
successfully published. However, data can be read from all 3 replicas. So
when I say write replicas, I mean the write quorum
----
2019-08-06 18:43:58 UTC - Zhenhao Li: I know it is not a fair question. Sorry I
was thinking in my context. The "problem" I face is basically fine-grained
recovery in a event-sourcing or stream system.
Steam processing frameworks like Flink has only stream level replay on
failures, which means the whole topic will be replayed from last checkpoint.
Akka Actors offers actor level recovery, but has to use something like
Cassandra for the persistent layer
----
2019-08-06 18:45:40 UTC - Zhenhao Li: I see. so the data replicating happens in
the background asynchronous?
----
2019-08-06 18:46:33 UTC - Zhenhao Li: to increase read throughput, I can set
write quorum of 10 and ack quorum of 2?
----
2019-08-06 18:48:25 UTC - Zhenhao Li: here is a crazier idea. is it possible to
have one partition per key and dynamically grow partitions? surely it is not
possible with Kafka
----
2019-08-06 18:52:20 UTC - Ambud Sharma: thanks @David Kjerrumgaard
----
2019-08-06 20:03:07 UTC - Vahid Hashemian: @Vahid Hashemian has joined the
channel
----
2019-08-06 20:09:40 UTC - Yu Yang: @Yu Yang has joined the channel
----
2019-08-06 21:22:08 UTC - Jerry Peng: replicating happens in the background
----
2019-08-06 21:27:33 UTC - Jerry Peng: > to increase read throughput, I can
set write quorum of 10 and ack quorum of 2?
correct
----
2019-08-06 21:27:51 UTC - Jerry Peng: data can also be stripped across multiple
nodes
----
2019-08-06 21:28:06 UTC - Jerry Peng: in that configuration you wouldn’t even
need to increase the write quorum
----
2019-08-06 21:29:07 UTC - Jerry Peng: Depends on how many keys. Pulsar can
have millions of topics
----
2019-08-06 21:29:15 UTC - Jerry Peng: partitions can be increased on the fly
----
2019-08-06 21:29:44 UTC - Jerry Peng: > is it possible to have one partition
per key.
Its possible but not a typical architecture
----
2019-08-06 22:07:14 UTC - Addison Higham: hrm... so the `/metrics` endpoint in
the broker is not authenticated so any client can hit it, but in the proxy, it
is authenticated, is that intended?
----
2019-08-06 23:30:32 UTC - Devin G. Bost: For a Pulsar function (like `public
String process(String input, Context context) throws Exception { . . . `)
is there a way to skip a message?
There are certain cases for one of our functions that we want to ignore (and
not send downstream).
----
2019-08-06 23:33:24 UTC - Victor Siu: I usually just return null from my
function but I’m not sure if that’s the recommended way.
----
2019-08-06 23:42:10 UTC - Victor Li: Is there a document describing all topic
level metrics in details? things like rate_in/out, throughput_in/out? I am not
sure what they mean specifically. Thanks!
----
2019-08-06 23:45:48 UTC - Devin G. Bost: haha thanks Victor.
----
2019-08-07 02:36:24 UTC - Sijie Guo: for pulsar + spark integration, check out
this spark connector: <https://github.com/streamnative/pulsar>
it support spark structured streaming and spark SQL.
@yijie also wrote a great blog post about it:
<https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017>
----
2019-08-07 02:37:56 UTC - Sijie Guo: I am working on a page for it. a pull
request will be out today
----
2019-08-07 02:39:01 UTC - Sijie Guo: I don’t think it is intended. I guess the
authentication was unfortunately applied because the http endpoint in proxy are
just forwarding requests to broker.
----
2019-08-07 02:39:43 UTC - Sijie Guo: @Chris Bartholomew can you rebase that
pull request? I will pick up the review for it.
----
2019-08-07 02:47:40 UTC - yijie: <https://github.com/streamnative/pulsar-spark>
----
2019-08-07 03:08:24 UTC - Sijie Guo: (sorry I pasted the wrong link)
----
2019-08-07 03:22:41 UTC - Rui Fu: hi all, how can we set function runtime’s
python version under localrun mode?
----
2019-08-07 03:23:52 UTC - Rui Fu: such as we have “python” -> python 2.7 and
“python3" -> python 3.x, we want python3 to run the function, but pulsar
with localrun mode always use python
----
2019-08-07 03:27:07 UTC - Ali Ahmed: @Rui Fu you can use venv
```mkdir test
cd test
python3 -m venv env
source env/bin/activate
python -V
```
----
2019-08-07 03:27:49 UTC - Rui Fu: @Ali Ahmed i will try, thanks :wink:
----
2019-08-07 03:31:34 UTC - Ali Ahmed: you will also need ```pip install
pulsar-client``` before you run the function in local mode
----
2019-08-07 06:30:16 UTC - divyasree: @Sijie Guo I am trying to enable
authentication using TLS, in two cluster (each cluster is in different DC) and
proxy , goe-replication is enabled this setup.
I am trying TLS, to make authenticationa and authorisation work in geo
replication.
----
2019-08-07 06:30:35 UTC - divyasree: I have configured the instance as below
----
2019-08-07 06:31:05 UTC - divyasree: ``` Broker Configuration
brokerServicePortTls=6651
webServicePortTls=8443
tlsEnabled=true
tlsCertificateFilePath=/opt/apache-pulsar-2.3.2/broker.cert.pem
tlsKeyFilePath=/opt/apache-pulsar-2.3.2/broker.key-pk8.pem
tlsTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem
tlsProtocols=TLSv1.2,TLSv1.1
tlsCiphers=TLS_DH_RSA_WITH_AES_256_GCM_SHA384,TLS_DH_RSA_WITH_AES_256_CBC_SHA
authenticationEnabled=true
authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls
brokerClientTlsEnabled=true
brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls
brokerClientAuthenticationParameters=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem
brokerClientTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem
Proxy Configuration
brokerServiceURLTLS=<pulsar+ssl://pulsar.ttc.ole.prd.target.com:6651>
brokerWebServiceURLTLS=<https://pulsar.ttc.ole.prd.target.com:8443>
authenticationEnabled=true
authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls
brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls
brokerClientAuthenticationParameters=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem
brokerClientTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem
tlsEnabledWithBroker=true
Client Configuration
webServiceUrl=<https://localhost:8443/>
brokerServiceUrl=<pulsar+ssl://localhost:6651/>
authPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls
authParams=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem
tlsTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem
tlsAllowInsecureConnection=false. ```
----
2019-08-07 06:31:30 UTC - divyasree: broker is not getting started, with the
above changes..
----
2019-08-07 06:31:46 UTC - divyasree: i dont see any error logs in the log file..
----
2019-08-07 06:32:17 UTC - divyasree: broker is getting stopped after few mins,
i guess.
----
2019-08-07 06:32:25 UTC - divyasree: dnt know what i am missing here
----
2019-08-07 06:32:33 UTC - divyasree: Can you help me out on this?
----
2019-08-07 06:38:59 UTC - Sijie Guo: how did you start pulsar broker. in
foreground or background (via pulsar-daemon)?
----
2019-08-07 06:45:27 UTC - divyasree: via pulsar-daemon
----
2019-08-07 06:46:47 UTC - divyasree: I have a doubt on broker.cert.pem file...
When creating this file i have given common name as wildcard
"pulsar-prd-*.<http://target.com|target.com>"
----
2019-08-07 06:47:47 UTC - divyasree: i have enabled proxy, which have dns name
as "pulsar.ttc........."
----
2019-08-07 06:48:12 UTC - Sijie Guo: change conf/log4j.yaml to set
immediateFlush to true. and try to start broker again and you will see the logs
```
RollingFile:
name: RollingFile
fileName: "${sys:pulsar.log.dir}/${sys:pulsar.log.file}"
filePattern:
"${sys:pulsar.log.dir}/${sys:pulsar.log.file}-%d{MM-dd-yyyy}-%i.log.gz"
immediateFlush: false
```
----
2019-08-07 06:48:46 UTC - divyasree: so do i need to create separage cert.pem
file and configure broker with broker.cert.pem and configure proxy with
proxy.cert.pem?
----
2019-08-07 06:52:44 UTC - divyasree: i am getting error as Caused by:
java.lang.IllegalArgumentException: unsupported cipher suite:
TLS_DH_RSA_WITH_AES_256_GCM_SHA384(DH-RSA-AES256-GCM-SHA384)
----
2019-08-07 06:53:03 UTC - divyasree: meaning the cipher mentioned in the conf
file is not supported right?
----
2019-08-07 06:53:28 UTC - divyasree: hope if i am giving blank, will support
all types of ciphers
----
2019-08-07 07:05:51 UTC - Sijie Guo: correct you can remove the supported
cipher and leave it blank
----
2019-08-07 07:12:20 UTC - divyasree: ok i works, making the broker to run..
----
2019-08-07 07:15:18 UTC - divyasree: but when producing messages through
client, i am getting the below error ``` Search domain query failed. Original
hostname: '<http://pulsar-prd-bk1.target.com|pulsar-prd-bk1.target.com>' failed
to resolve
'<http://pulsar-prd-bk1.target.com.stores.target.com|pulsar-prd-bk1.target.com.stores.target.com>'
after 7 queries
at
org.apache.pulsar.shade.io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:848)
```
----
2019-08-07 07:16:06 UTC - divyasree:
"<http://stores.target.com|stores.target.com>" this is getting appended at the
end of the hostname
----
2019-08-07 07:16:11 UTC - divyasree: what does it mean
----
2019-08-07 07:18:14 UTC - Kim Christian Gaarder: Q: In the Java client, will a
call like “consumer.receive(0, TimeUnit.SECONDS)” always return a message if
one is available in the topic, or will it time out if it cannot deliver the
message immediately? i.e. I want to know if I can rely on this method to know
for sure whether I have found the last message in the topic.
----
2019-08-07 07:18:31 UTC - Sijie Guo: I think it is attempts to query your DNS
server. and your DNS server returns the name
‘<http://pulsar-prd-bk1.target.com.stores.target.com|pulsar-prd-bk1.target.com.stores.target.com>’
----
2019-08-07 07:18:46 UTC - Sijie Guo: how did you configure the DNS and LB?
----
2019-08-07 07:19:06 UTC - divyasree: we are having VM's in openstack
----
2019-08-07 07:19:19 UTC - divyasree: so i configure via openstack
----
2019-08-07 07:19:37 UTC - divyasree: add the brokers to the pool of the LB
listener
----
2019-08-07 07:20:10 UTC - divyasree: but same configuration works with token
authentication for me
----
2019-08-07 07:20:36 UTC - divyasree: i have configured TCP for token
authentication with port 6650
----
2019-08-07 07:20:51 UTC - divyasree: do i need to change it to HTTPS with 6651?
----
2019-08-07 07:25:10 UTC - Sijie Guo: it returns if there is one message already
prefetched in the consumer queue, it will return the message. However it
doesn’t mean if it is at the end of the topic.
if you want to do so, use hasMessagesAvailable.
----
2019-08-07 07:31:07 UTC - Kim Christian Gaarder: There is no such method
hasMessagesAvailable
----
2019-08-07 07:31:19 UTC - Kim Christian Gaarder: Not in the “Consumer”
interface anyway
----
2019-08-07 07:34:41 UTC - Sijie Guo: I see. oh it is only available in `Reader`
interface for now. But it is available in Consumer though. because the
fundamentals are the same. You can consider upcast to ConsumerImpl for now, we
can look into expose it to consumer as well.
----
2019-08-07 07:34:58 UTC - Kim Christian Gaarder: ok, thanks
:slightly_smiling_face:
----
2019-08-07 07:35:04 UTC - Kim Christian Gaarder: I will attempt the upcast
----
2019-08-07 07:36:19 UTC - Kim Christian Gaarder: My problem right now is that
the only reliable way I have to determine whether I have reached the last
message is by producing a marker message and consuming until I get that
message, which is obviously not ideal as I have to write to the topic in order
to reliably know whether I have read all messages.
----
2019-08-07 07:36:59 UTC - Sijie Guo: @divyasree hmm I am not sure whether it is
related to the common name or not.
----
2019-08-07 07:37:07 UTC - Kim Christian Gaarder: but hasMessagesAvailable could
potentially solve that problem!
+1 : Sijie Guo
----
2019-08-07 07:37:42 UTC - Sijie Guo: did you try connect with ip?
----
2019-08-07 07:40:05 UTC - divyasree: ok.. any suggestion on this part?
``` I have a doubt on broker.cert.pem file... When creating this file i have
given common name as wildcard "pulsar-prd-*.<http://target.com|target.com>"
so do i need to create separage cert.pem file and configure broker with
broker.cert.pem and configure proxy with proxy.cert.pem?
And also i saw, in documentation mentioning the cert with "my-roles.cert.pem"
who can we define the role in the pem
i am confused in this part ```
----
2019-08-07 07:41:09 UTC - divyasree: trying with IP, or the proxy name didnt
work.. I guess, since i have given wildcard host name with matches the broker
hostname alone.. it didnt work
----
2019-08-07 07:41:44 UTC - divyasree: that y i am asking, do we need to generate
separate cert for proxy
----
2019-08-07 07:42:07 UTC - divyasree: if so, which on should i use where? And
wat to use in client?
----
2019-08-07 07:42:48 UTC - divyasree: And also, which key should i use to create
authorisation token, and how to specify role in it..
----
2019-08-07 07:44:43 UTC - divyasree: Can you provide detailed docs on tls with
proxy
----
2019-08-07 07:45:07 UTC - Sijie Guo: since you are configuring proxy to connect
to brokers with tls cert, I would suggest you configure proxy certs:
- a client/server cert pair: one for configuring broker and for the
`brokerClient` part in proxy.
- a client/server cert pair: one for configuring proxy and the client. because
your client is talking to proxy
----
2019-08-07 08:12:56 UTC - Zhenhao Li: thanks!
----
2019-08-07 08:16:33 UTC - Zhenhao Li: what do you mean by "stripped across
multiple nodes"?
----
2019-08-07 08:16:58 UTC - Zhenhao Li: isn't the write quorum across multiple
nodes?
----
2019-08-07 08:32:35 UTC - Kim Christian Gaarder: Q: Assuming we don’t know any
pulsar message-ids externally, is there any way to ask Pulsar for the content
of the last message currently published on a topic without scanning through all
message from the beginning?
Even if I can get the message-id of the last message in the topic, I have not
found a way to read that message without scanning from the start (or by
resuming from a previous subscription position). This is why the
hasMessageAvailable method was helpful, but it still does not provide a good
way to read the last message without having to first scan through all messages
or to use a separate subscription to track this.
----
2019-08-07 08:34:28 UTC - boldrick: @boldrick has joined the channel
----
2019-08-07 08:34:48 UTC - Sijie Guo: I think there was a similar request before
to return the last message (id). The mechanism is actually ready and being used
for `hasMessageAvailable). @jia zhai was creating a issue for this new api
request.
----
2019-08-07 08:35:12 UTC - Kim Christian Gaarder: As I understand it, if we knew
the message-id of the message before the last message, then we could seek to
that and then we would be reading the last message next.
----
2019-08-07 08:35:37 UTC - Kim Christian Gaarder: but seek will not allow you to
read the message of the message-id that you seek to, only the next message.
----
2019-08-07 08:36:13 UTC - Matteo Merli: Take a look at
`ReaderBuilder.startMessageIdInclusive(true)`
----
2019-08-07 08:36:44 UTC - Matteo Merli: creating a reader starting on
`MessageId.Latest` inclusive
----
2019-08-07 08:36:57 UTC - Kim Christian Gaarder: ahh … that will solve it!
thank you!
----
2019-08-07 08:37:55 UTC - Kim Christian Gaarder: @Matteo Merli can that only be
done when creating a new reader, or is there a way to re-use an existing reader
to repeat this operation ?
----
2019-08-07 08:39:02 UTC - Kim Christian Gaarder: I suppose in my use-case it’s
good enough to only do this when creating a new reader, as I only need this for
continuation after failure (or intentional stoppage) which is rare anyway.
----
2019-08-07 08:39:28 UTC - Matteo Merli: Yes, it’s done on the reader creation
only
----
2019-08-07 08:39:51 UTC - boldrick: hi guys, I wanted to ask you if there is an
easy way to build local docker image of pulsar. Found the Dockerfile that
builds the C++ client but wanted to build pulsar-all and I thought it is the
pulsar/docker/pulsar/Dockerfile but it wants to move all from /apache-pulsar
but there isn't such directory in the project
----
2019-08-07 08:42:36 UTC - Kim Christian Gaarder: @Matteo Merli Actually the
reader interface javadoc says: “This configuration option also applies for any
cursor reset operation like Reader.seek(MessageId).” which means this will also
apply to seek :slightly_smiling_face:
----
2019-08-07 08:46:49 UTC - Matteo Merli: Ok, then it should work
:slightly_smiling_face:
----
2019-08-07 08:52:10 UTC - Kim Christian Gaarder: This is very good news, I felt
like there was some functionality missing here in order to build some basic
applications on top of Pulsar, this really does solve all of that!
+1 : Sijie Guo, Matteo Merli
----
2019-08-07 09:06:04 UTC - Sijie Guo:
<https://github.com/apache/pulsar/pull/4910>
----
2019-08-07 09:06:07 UTC - Sijie Guo: here is the pull request
----
2019-08-07 09:08:18 UTC - Matteo Merli: @boldrick You can build the docker
image (and everything that’s required) from scratch with `mvn install -Pdocker
-DskipTests`
----