2019-05-14 11:23:14 UTC - tuteng: @Jerry Peng 
<https://github.com/apache/pulsar/issues/4248> Can you spare some time to help 
look at this issue?
----
2019-05-14 13:29:44 UTC - Nathan Linebarger: Hi everyone, I was hoping for some 
help (searched all the docs available so far). Is there any way to subscribe to 
messages with a certain key? E.g., some sort of server-side message filtering? 
I haven't found any way yet. I need it b/c I'll have many consumers only 
interested in a subset of keys, and don't want to max out my NICs by consuming 
the entire topic and doing client-side filtering
----
2019-05-14 14:07:38 UTC - Devin G. Bost: @Sanjeev Kulkarni What's the 
motivation/intuition? It seems like we would just need to write a lot of 
functionality that is already supported by the admin library.
----
2019-05-14 14:11:53 UTC - Devin G. Bost: The REST interface is not as well 
documented, so it will be harder for us to get others to support it.
----
2019-05-14 14:17:41 UTC - Devin G. Bost: I'm not sure that I understand the 
value of the Java Admin API at all if we can't use it to write Pulsar functions 
to control governance. It seems to me that its greatest value proposition is 
its integration with Pulsar.
----
2019-05-14 14:19:06 UTC - Devin G. Bost: @Nathan Linebarger I wonder if you're 
referring to the message-property feature that they're currently working on. I 
think they're planning on releasing it with 2.3.2.
----
2019-05-14 14:35:01 UTC - Devin G. Bost: I was able to locate documentation on 
the REST API here: <http://pulsar.apache.org/docs/latest/reference/RestApi/>
but that documentation says nothing about what objects/data we expect to get 
back from any of those calls.
----
2019-05-14 14:37:40 UTC - Devin G. Bost: It also would have been nice to know 
weeks ago that this use case of the Java Admin API would not be 
supported/recommended before we wasted our time on it.
----
2019-05-14 14:38:07 UTC - Matteo Merli: No, currently there’s no direct way to 
filter a message on server side based on key or properties. 

A possible workaround is to publish the filtered streams into different topics 
and have consumers pick it up from there. 
----
2019-05-14 14:46:06 UTC - Brian Doran: @Sijie Guo Is this something that will 
be looked at in the short term? Thanks
----
2019-05-14 14:56:27 UTC - Nathan Linebarger: Thanks @Matteo Merli, good idea, 
maybe each such consumer can deploy a specialized Function that will filter to 
a destination topic. Two other work-arounds I thought of: (1) Have many 
partitions (say, 1M), and only subscribe to the partitions that have a key 
you're interested in (assuming the key/partition mapping is predetermined and 
not round-robin), and (2) Have a Function that for each message, outputs (key, 
sequence_id) to Topic B. You can then subscribe to Topic B (a much smaller 
topic) to determine which exact sequence IDs you'd be interested in and poll 
them somehow. But your first suggestion of filtering to a separate topic looks 
best.
----
2019-05-14 14:56:44 UTC - Nathan Linebarger: Thanks @Devin G. Bost, do you know 
of any Issue # associated with it or any way I can look at the feature?
----
2019-05-14 15:03:58 UTC - Devin G. Bost: @Nathan Linebarger Here's what I was 
thinking of: <https://github.com/apache/pulsar/issues/4042>
----
2019-05-14 15:04:39 UTC - Richard Sherman: @Richard Sherman has joined the 
channel
----
2019-05-14 15:13:04 UTC - Nathan Linebarger: It looks like the issue described 
is a little different @Devin G. Bost, but thank you for linking. Maybe I will 
add an issue for server-side message filtering for subscribers
----
2019-05-14 15:23:27 UTC - Devin G. Bost: @Nathan Linebarger
If the key is in a message property, can't you just create a filter/router 
function?
----
2019-05-14 15:24:24 UTC - Devin G. Bost: The function would route the messages 
according to the key.
----
2019-05-14 15:25:25 UTC - Devin G. Bost: Alternatively, you could put the key 
in the topic name.
----
2019-05-14 15:31:11 UTC - Nathan Linebarger: yes I think that work around could 
work (also suggested by merlimat), which is to have a Function which routers to 
a "filter topic." Exactly what keys should be filtered is constantly changing 
in my use case, so the Function would need to have a way to dynamically update 
what keys it should filter (perhaps through using the State API, or maybe even 
another topic to publish filter criteria changes)
----
2019-05-14 15:34:32 UTC - Nathan Linebarger: Separate topic for each key could 
work, I know that Puslar can scale to 1M+ topics, I wonder if this solution 
would scale arbitrarily (e.g., 1B topics, with adequate hardware)
----
2019-05-14 15:43:00 UTC - Devin G. Bost: If we go the REST path, then we'd need 
to create something almost identical to the Java Admin API, which seems like 
duplicate work.
----
2019-05-14 15:58:01 UTC - Jerry Peng: Sure u can take a look
----
2019-05-14 15:58:36 UTC - Jerry Peng: Sure I can take a look
----
2019-05-14 16:05:38 UTC - Jerry Peng: To get pulsar client or pulsar admin 
client to work inside a pulsar function is harder than I first imagined. In 
Java function instance library we shade all of the 3rd party dependencies but 
not pulsar dependencies which is normal practice for something like this. I 
will need sometime time to think and discuss with other people about how if 
possible to support something like this
----
2019-05-14 16:11:03 UTC - Sanjeev Kulkarni: @Devin G. Bost The current 
structure of pulsar-admin is too complicated. All pulsar admin calls are rest 
based. thus ideally pulsar-admin should have been a thin wrapper around a rest 
library. However it uses far too many ‘internal’ data structures that become a 
shading nightmare since these very structures are used by other internal parts 
of pulsar. There is a long pending item of restructuring the client, but it 
just hasnt been done yet
----
2019-05-14 16:11:37 UTC - Devin G. Bost: That's helpful to know. Thanks for the 
explanation.
----
2019-05-14 16:12:30 UTC - Devin G. Bost: Where on your priority list / roadmap 
is addressing this technical debt?
----
2019-05-14 16:15:10 UTC - Sanjeev Kulkarni: there have been multiple thoughts 
around it. The current thinking is to completely deprecate the pulsar-admin and 
move to a thin go based admin tool.
----
2019-05-14 16:16:22 UTC - Devin G. Bost: I'm in favor of completely deprecating 
the pulsar-admin.
----
2019-05-14 16:17:19 UTC - Devin G. Bost: We're interested in helping with this 
initiative because we need to build something that will work for our CI/CD 
automation.
----
2019-05-14 16:17:44 UTC - Sanjeev Kulkarni: that sounds good.
----
2019-05-14 16:23:17 UTC - Jerry Peng: @Devin G. Bost if you are adventurous 
what you can also do is shade your own version of pulsar-client-admin.  Similar 
to what we do here: 
<https://github.com/apache/pulsar/blob/master/pulsar-client-admin-shaded/pom.xml>
You can relocate all the classes in the pulsar-client-admin JAR to be under 
something like com.overstock.org.apache.pulsar… and include that as a 
dependency to your function and use that custom shaded version of the 
pulsar-client-admin
----
2019-05-14 16:28:04 UTC - Devin G. Bost: I appreciate the suggestion, but if 
there's interest in just deprecating the existing library and creating a new 
tool in Go, I'd rather just get a jumpstart on that once I have time.
----
2019-05-14 16:29:22 UTC - Jerry Peng: but that doesn’t really solve running 
pulsar admin in a Java Function?
----
2019-05-14 17:34:36 UTC - Jerry Peng: @Devin G. Bost  can you describe more in 
detail the CI/CD pipeline you are building for functions with functions? So we 
can get a better understanding of how to improve functions 
----
2019-05-14 17:55:48 UTC - Devin G. Bost: Yes, I'll see if we can get you some 
diagrams. We might need to go over them in a call after you have a chance to 
look them over.
----
2019-05-14 17:58:14 UTC - Devin G. Bost: Regarding your question about running 
Pulsar admin in a Java Function... We're not attached to Java. If we could use 
a different language for governance and automation within a Pulsar function, 
we'd be happy with that. Python would be my first choice, but Go is a useful 
language as well.
----
2019-05-14 18:02:56 UTC - Devin G. Bost: I also like Scala, but that means 
working with Java, and I'd rather use .Net core than Java in most cases.
----
2019-05-14 18:15:30 UTC - Thor Sigurjonsson: @Jerry Peng I guess in trying to 
describe our use case or rather where we're thinking we want to go -- it's a 
little abstract -- but here goes:

I think ultimately we're trying to take more complex representations of systems 
of flows and functions and boil it down to deltas that need deployment from one 
state of the representation and another (a deploy of a subset of functions for 
example that have changed).

We'd like to turn a "topology" like that into a "data contract" with a 
component that just does the work of provisioning what is in a message it 
receives.

What that component is might be less important than getting that data contract 
right and having transformations of one representation  into another (deploy 
something this way or that).
----
2019-05-14 18:16:44 UTC - Thor Sigurjonsson: We want to build a component that 
just takes a message and abstracts away the "devops" work of calling the Right 
APIs and does it quickly and well.
----
2019-05-14 18:17:14 UTC - Thor Sigurjonsson: We can be agile in building that, 
but we'd like to settle on a data contract that is general enough. We can 
re-write our implementation or improve it as needed.
----
2019-05-14 18:18:01 UTC - Thor Sigurjonsson: We started with a wrapper around 
the `pulsar-admin` tool and sort of have been evolving it. We got a huge 
speedup of course doing it in a smarter way.
----
2019-05-14 18:18:47 UTC - Thor Sigurjonsson: We'd like to get to a place where 
we're dealing with message transformations as "DevOps" than integrations with 
components.
----
2019-05-14 18:19:21 UTC - Thor Sigurjonsson: We expect there may be more than 
just pulsar artifacts involved as we build this out, but we are starting with 
pulsar functions, namespaces, etc.
----
2019-05-14 18:20:08 UTC - Thor Sigurjonsson: Wish I had an elevator speech on 
hand but I hope that communicates that part of our architecture and use case...
----
2019-05-14 18:22:03 UTC - Thor Sigurjonsson: @Sanjeev Kulkarni @Matteo Merli I 
hope this explanation is helpful for you guys too (I think you and Devin had 
more related conversations earlier).
----
2019-05-14 18:27:32 UTC - Devin G. Bost: Regarding:

&gt; 'We'd like to turn a "topology" like that into a "data contract" with a 
component that just does the work of provisioning what is in a message it 
receives.'

The idea is that it's a more declarative, data-driven, stream-based approach to 
enabling incremental/rolling updates to Pulsar at-scale. The traditional 
Jenkins process may not work for some teams, so we want to leverage what we 
love about Pulsar instead of forcing people to use "this" tool or "that" tool 
for CI/CD. The idea is that as long as they can meet the data contract, we can 
deploy to Pulsar. That's our objective.
----
2019-05-14 18:28:28 UTC - Devin G. Bost: Using a Function with the Java Admin 
API was one hypothetical way of doing that, but we're open to other ideas.
----
2019-05-14 18:30:39 UTC - Thor Sigurjonsson: I think for our first deliverable 
we'll probably just get our java admin api solution rigged up against 
producer/consumer or `main` based cli tool if that snags out for some reason. 
Functions would be elegant in a way, but I see the complication when it's also 
java codebase within pulsar and not containerized/isolated workload...
----
2019-05-14 18:32:53 UTC - Thor Sigurjonsson: I guess to add to what Devin was 
saying:
"The idea is that as long as they can meet the data contract, we can deploy to 
Pulsar. That's our objective."
I'd just add that "their" data contract won't be a "full system" or how to 
safely deploy or prove it in production. That's where my comments about 
"transforming one representation into another come in".
----
2019-05-14 20:22:29 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson thanks 
for the explanation
----
2019-05-14 20:25:18 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson I have 
to discussing approach to java functions in which we use separate classloaders 
to side step this shading issue and in theory will then allow users to use 
pulsar admin client and pulsar client in a function.  I think that will be a 
better approach for java function in general.  However, this implementing this 
approach will take time (many be a couple of days of work)
----
2019-05-14 20:26:18 UTC - Devin G. Bost: Thanks for the feedback.
----
2019-05-14 20:26:20 UTC - Karthik Ramasamy: If you are in Bay Area - check out 
the meetup in July on Kafka and Pulsar
----
2019-05-14 20:26:22 UTC - Karthik Ramasamy: 
<https://www.meetup.com/SF-Big-Analytics/events/261460187/>
+1 : Devin G. Bost, Sijie Guo, David Kjerrumgaard, Nathan Linebarger, Ali 
Ahmed, Chris Bartholomew, Matteo Merli, Ezequiel Lovelle, Sree Vaddi, 
Guangzhong Yao
----
2019-05-14 20:29:02 UTC - Jerry Peng: but again you can always use python 
functions or java functions to issue your own HTTP calls to the pulsar REST API 
to unblock yourselves for now.
----
2019-05-14 20:57:15 UTC - Nathan Linebarger: Thank you. I will try to catch it 
online :+1:
----
2019-05-14 21:00:01 UTC - Matteo Merli: 1B is a very big number 
:slightly_smiling_face:
----
2019-05-15 02:16:32 UTC - Sree Vaddi: don't miss this, team 
:slightly_smiling_face: please.
----
2019-05-15 07:56:47 UTC - Eugene: @Eugene has joined the channel
----
2019-05-15 09:03:13 UTC - bhagesharora: Hello Everyone,
I just want to undestand one scenario.We are pushing the messages from producer 
and consumed the messages through consumer. After that I acknowledge all the 
messages using consumer.acknowledge(msg) in python client.
Again When I am trying to read the messages I am using Reader Interface(I am 
doing rewind process using messageID) so all the acknowledge messages is 
coming. I just want to understand How all the messages is coming if its already 
acknowledge. If all the messages is coming it means somewhere its stored so 
what is the memory location and How we can override this behaviour. Is there 
any pulsar-admin command or configuration related changes need to be done ??
----

Reply via email to