This is an automated email from the ASF dual-hosted git repository. mmerli pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git
The following commit(s) were added to refs/heads/master by this push: new 125ca58 Pulsar Functions diagrams (#1502) 125ca58 is described below commit 125ca585da6b02ae35c9681ab96aa302f9efdaa0 Author: Luc Perkins <lucperk...@gmail.com> AuthorDate: Fri Apr 6 11:53:57 2018 -0700 Pulsar Functions diagrams (#1502) * Add brief section to overview on processing guarantees * add new sections to overview * add basic PF diagram * add word count diagram * add two new examples to overview * switch from python to java example * Update description of counters --- site/_sass/_docs.scss | 2 +- site/docs/latest/functions/guarantees.md | 2 +- site/docs/latest/functions/overview.md | 139 ++++++++++++++++++++++++-- site/img/pulsar-functions-overview.png | Bin 0 -> 77077 bytes site/img/pulsar-functions-routing-example.png | Bin 0 -> 62087 bytes site/img/pulsar-functions-word-count.png | Bin 0 -> 85250 bytes 6 files changed, 133 insertions(+), 10 deletions(-) diff --git a/site/_sass/_docs.scss b/site/_sass/_docs.scss index 42277f0..a66bd16 100644 --- a/site/_sass/_docs.scss +++ b/site/_sass/_docs.scss @@ -195,7 +195,7 @@ border-bottom: 1px solid black; tr th { - padding-right: $table-right-padding; + padding: 0 $table-right-padding .5rem 0; } } diff --git a/site/docs/latest/functions/guarantees.md b/site/docs/latest/functions/guarantees.md index 541ac55..f6adc31 100644 --- a/site/docs/latest/functions/guarantees.md +++ b/site/docs/latest/functions/guarantees.md @@ -10,7 +10,7 @@ Delivery semantics | Description :------------------|:------- **At-most-once** delivery | Each message that is sent to the function will most likely be processed but also may not be (hence the "at most") **At-least-once** delivery | Each message that is sent to the function could be processed more than once (hence the "at least") -**Effectively-once** delivery | Each message that is sent to the function will have one output associated with it. The function may be invoked more than once, perhaps due to some kind of system failure, but the function will produce one effect for each incoming message. +**Effectively-once** delivery | Each message that is sent to the function will have one output associated with it ## Applying processing guarantees to a function diff --git a/site/docs/latest/functions/overview.md b/site/docs/latest/functions/overview.md index 764515d..9629856 100644 --- a/site/docs/latest/functions/overview.md +++ b/site/docs/latest/functions/overview.md @@ -10,7 +10,7 @@ preview: true * apply a user-supplied processing logic to each message, * publish the results of the computation to another topic -Here's an example Pulsar Function for Java: +Here's an example Pulsar Function for Java (using the [native interface](../api#java-native)): ```java import java.util.Function; @@ -27,9 +27,9 @@ Functions are executed each time a message is published to the input topic. If a The core goal behind Pulsar Functions is to enable you to easily create processing logic of any level of complexity without needing to deploy a separate neighboring system (such as [Apache Storm](http://storm.apache.org/), [Apache Heron](https://apache.github.io/incubator-heron), [Apache Flink](https://flink.apache.org/), etc.). Pulsar Functions is essentially ready-made compute infrastructure at your disposal as part of your Pulsar messaging system. This core goal is tied to a series of [...] -* Developer productive ([language-native](#native) vs. [Pulsar Functions SDK](#sdk) functions) -* easy troubleshooting -* Operational simplicity (no need for an external system) +* Developer productivity ([language-native](#native) vs. [Pulsar Functions SDK](#sdk) functions) +* Easy troubleshooting +* Operational simplicity (no need for an external processing system) ## Inspirations @@ -41,7 +41,98 @@ The Pulsar Functions feature was inspired by (and takes cues from) several syste Pulsar Functions could be described as * [Lambda](https://aws.amazon.com/lambda/)-style functions that are -* specifically designed to work with Pulsar +* specifically designed to use Pulsar as a message bus + +## Programming model + +The core programming model behind Pulsar Functions is very simple: + +* Functions receive messages from one or more **input {% popover topics %}**. Every time a message is received, the function can do a variety of things: + * Apply some processing logic to the input and write output to: + * An **output topic** in Pulsar + * [Apache BookKeeper](#state-storage) + * Write logs to a **log topic** (potentially for debugging purposes) + * Increment a [counter](#counters) + +![Pulsar Functions core programming model](/img/pulsar-functions-overview.png) + +### Word count example {#word-count} + +If you were to implement the classic word count example using Pulsar Functions, it might look something like this: + +![Pulsar Functions word count example](/img/pulsar-functions-word-count.png) + +Here, sentences are produced on the `sentences` topic. The Pulsar Function listens on that topic and whenever a message arrives it splits the sentence up into individual words and increments a [counter](#counters) for each word every time that word is encountered. The value of that counter is then available to all [instances](#parallelism) of the function. + +If you were writing the function in [Java](../api#java) using the [Pulsar Functions SDK for Java](../api#java-sdk), you could write the function like this... + +```java +package org.example.functions; + +import org.apache.pulsar.functions.api.Context; +import org.apache.pulsar.functions.api.Function; + +import java.util.Arrays; + +public class WordCountFunction implements Function<String, Void> { + // This function is invoked every time a message is published to the input topic + @Override + public Void process(String input, Context context) { + Arrays.asList(input.split(" ")).forEach(word -> { + String counterKey = word.toLowerCase(); + context.incrCounter(counterKey, 1) + }); + return null; + } +} +``` + +...and then [deploy it](#cluster-mode) in your Pulsar cluster using the [command line](#cli) like this: + +```bash +$ bin/pulsar-admin functions create \ + --jar target/my-jar-with-dependencies.jar \ + --className org.example.functions.WordCountFunction \ + --tenant sample \ + --namespace ns1 \ + --name word-count \ + --inputs persistent://sample/standalone/ns1/sentences \ + --output persistent://sample/standalone/ns1/count +``` + +### Content-based routing example {#content} + +The use cases for Pulsar Functions are essentially endless, but let's dig into a more sophisticated example that involves content-based routing. + +Imagine a function that takes items (strings) as input and publishes them to either a fruits or vegetables topic, depending on the item. Or, if an item is neither a fruit nor a vegetable, a warning is logged to a [log topic](#logging). Here's a visual representation: + +![Pulsar Functions routing example](/img/pulsar-functions-routing-example.png) + +If you were implementing this routing functionality in Python, it might look something like this: + +```python +from pulsar import Function + +class RoutingFunction(Function): + def __init__(self): + self.fruits_topic = "persistent://sample/standalone/ns1/fruits" + self.vegetables_topic = "persistent://sample/standalone/ns1/vegetables" + + def is_fruit(item): + return item in ["apple", "orange", "pear", "other fruits..."] + + def is_vegetable(item): + return item in ["carrot", "lettuce", "radish", "other vegetables..."] + + def process(self, item, context): + if self.is_fruit(item): + context.publish(self.fruits_topic, item) + elif self.is_vegetable(item): + context.publish(self.vegetables_topic, item) + else: + warning = "The item {0} is neither a fruit nor a vegetable".format(item) + context.get_logger().warn(warning) +``` ## Command-line interface {#cli} @@ -88,6 +179,12 @@ You can also mix and match configuration methods by specifying some function att Pulsar Functions can currently be written in [Java](../../functions/api#java) and [Python](../../functions/api#python). Support for additional languages is coming soon. +## The Pulsar Functions API {#api} + +* Type safe (bytes versus specific types) +* SerDe (built-in vs. custom) +* Pulsar messages are always just bytes, but Pulsar Functions handles data types for you *unless* you need custom types + ## Function context {#context} Each Pulsar Function created using the [Pulsar Functions SDK](#sdk) has access to a context object that both provides: @@ -95,7 +192,10 @@ Each Pulsar Function created using the [Pulsar Functions SDK](#sdk) has access t 1. A wide variety of information about the function, including: * The name of the function * The {% popover tenant %} and {% popover namespace %} of the function - * [User-supplied configuration]() values + * [User-supplied configuration](#user-config) values +2. Special functionality, including: + * The ability to produce [logs](#logging) to a specified logging topic + * The ability to produce [metrics](#metrics) ### Language-native functions {#native} @@ -103,7 +203,7 @@ Both Java and Python support writing "native" functions, i.e. Pulsar Functions w The benefit of native functions is that they don't have any dependencies beyond what's already available in Java/Python "out of the box." The downside is that they don't provide access to the function's [context](#context) -### The Pulsar Functions SDK {#sdk} +## The Pulsar Functions SDK {#sdk} If you'd like a Pulsar Function to have access to a [context object](#context), you can use the Pulsar Functions SDK, available for both [Java](../api#java-sdk) and [Pythnon](../api#python-sdk). @@ -239,6 +339,29 @@ public class ConfigMapFunction implements Function<String, Void> { } ``` +## Processing guarantees {#guarantees} + +The Pulsar Functions feature provides three different messaging semantics that you can apply to any function: + +Delivery semantics | Description +:------------------|:------- +**At-most-once** delivery | Each message that is sent to the function will most likely be processed but also may not be (hence the "at most") +**At-least-once** delivery | Each message that is sent to the function could be processed more than once (hence the "at least") +**Effectively-once** delivery | Each message that is sent to the function will have one output associated with it + +This command, for example, would run a function in [cluster mode](#cluster-mode) with effectively-once guarantees applied: + +```bash +$ bin/pulsar-admin functions create \ + --name my-effectively-once-function \ + --processingGuarantees EFFECTIVELY_ONCE \ + # Other function configs +``` + ## Metrics -Pulsar Functions that use the [Pulsar Functions SDK](#sdk) can publish metrics to Pulsar. For more information, see [Metrics for Pulsar Functions](../metrics). \ No newline at end of file +Pulsar Functions that use the [Pulsar Functions SDK](#sdk) can publish metrics to Pulsar. For more information, see [Metrics for Pulsar Functions](../metrics). + +## State storage + +Pulsar Functions use [Apache BookKeeper](https://bookkeeper.apache.org) as a state storage interface. All Pulsar installations, including local {% popover standalone %} installations, include a deployment of BookKeeper {% popover bookies %}. \ No newline at end of file diff --git a/site/img/pulsar-functions-overview.png b/site/img/pulsar-functions-overview.png new file mode 100644 index 0000000..065046b Binary files /dev/null and b/site/img/pulsar-functions-overview.png differ diff --git a/site/img/pulsar-functions-routing-example.png b/site/img/pulsar-functions-routing-example.png new file mode 100644 index 0000000..27a1c44 Binary files /dev/null and b/site/img/pulsar-functions-routing-example.png differ diff --git a/site/img/pulsar-functions-word-count.png b/site/img/pulsar-functions-word-count.png new file mode 100644 index 0000000..ad0c280 Binary files /dev/null and b/site/img/pulsar-functions-word-count.png differ -- To stop receiving notification emails like this one, please contact mme...@apache.org.